« PREVIOUS ENTRY
Does calorie labeling get Starbucks customers to eat light? With food — but not with drinks

Here’s a study with an interesting finding: If you want to get better results on Google, try using a shorter query.
I found this while doing research for a story about automated “question answering” systems. I was reading through the work of James Allan, a computer scientist at the University of Massachusetts, and read his paper “A Case for Shorter Queries, and Helping Users Create Them” (PDF here). In it, he and his coauthor Giridhar Kumaran conducted an experiment: They took the query Define Argentine and British international relations and ran it through a search engine. (They don’t specify which one they used.) Then they ran various similar queries that used fewer words — “sub queries” — such as define britain international argentina or define britain relate argentina. Each time, he graded the relevance the search engine’s results, expressed as their “average precision” on a scale of zero to 1.0.
So which sub-query produced the best results? The shortest one. It was only two words long — britain argentina — but it scored 0.626, quite a lot better than the original, full-sentence query, which scored only 0.424.
Why would short queries work better than longer ones? Possibly because they contain fewer “noise terms” — common words like define or and — which might muddy the search results. Human language is filled with ambiguity; one of the big challenges for a machine is taking a human question and figuring out what, semantically, it’s actually asking. In that sense, using fewer words would reduce the number of potential ways the machine can misunderstand you.
Except the truly strange thing in that example above is the question was asking about British and Argentinian international relations — yet the best results came from removing the words “international” and “relations”. I’d have expected those to be important words, no? But that’s precisely the point Allan is getting at here:
Sub-queries a human would consider as an incomplete expression of information need sometimes performed better than the original query.
This suggests, of course, that the best way to get results on a search engine is to radically strip your query down even further than you think is useful. Or maybe start with a regular query, and if you don’t like the results, try making it shorter and shorter.
Then again, it’s hard to know if this would really work. I’m not privy to what’s going on behind the hood of most search engines today. Allan’s paper discusses several ways for question-answering systems to have the computer automatically shorten a query before feeding it into the knowledge database; but his paper is a few years old, so maybe these techniques are already common amongst search engines — maybe they already reformat our queries into semantically shorter formats.
What do you guys think? Anecdotally, have you found that super-short queries work better than longer, sentence-like ones?
I'm Clive Thompson, a writer on science, technology, and culture. This blog collects bits of offbeat research I'm running into, and musings thereon.
Currently, I'm a contributing writer for the New York Times Magazine and a columnist for Wired magazine. I also write for Fast Company and Wired magazine's web site, among other places. Email or AOL IM me (pomeranian99) to say hi or send in something strange!
The “Milky Way Transit Authority” map
Should automobile software be open-sourced?
My Bookforum review of Jaron Lanier’s “You Are Not A Gadget”
Molecular secrets of the “iron-plated snail”
» visit the Collision Detection archives
January 31, 2010 » 07:29 PM
V. A. To me death seems to be an evil.
M. What, to those who are already dead? or to those who must die?
A. To both.
M. It is a misery, then, because an evil?
A. Certainly.
M. Then those who have already died, and those who have still got to die, are both miserable?
A. So it appears to me.
M. Then all are miserable?
A. Every one.
January 24, 2010 » 03:22 PM
One of the more interesting trends is family, which came in at number five. Specifically, discussion about family, moms, dads, daughters, etc. jumped during 2009. With Facebook users getting older, this isn’t a big surprise. However, the fact that the mention of “kids” jumped by a factor of five this year is rather dramatic. It’s tough to know what this means, though. (via Facebook Unveils Most-Mentioned Topics of 2009
)
January 15, 2010 » 01:36 PM
BEYOND AWESOME. They are announcing a recall of the Plush Uterus “due to a potential choking hazard for children”. To apply for it, “Please send an email to the address below with the subject line, ‘UTERUS OPT OUT’”.
January 14, 2010 » 10:04 PM
“To order, please TYPE “YES” IN CHECKBOX BELOW TO AGREE YOU UNDERSTAND THIS PLUSH MUST BE KEPT AWAY FROM KIDS (it is a sex organ, after all). If it is not checked, WE WILL NOT SEND THE UTERUS.” (via @ibogost)
January 11, 2010 » 01:45 PM
I watched Space: 1999 back in the day, but I swear to god I do not remember this scene.
» see all of my photos on Flickr
ECHO
Erik Weissengruber
Vespaboy
Terri Senft
Tom Igoe
El Rey Del Art
Morgan Noel
Maura Johnston
Cori Eckert
Heather Gold
Andrew Hearst
Chris Allbritton
Bret Dawson
Michele Tepper
Sharyn November
Gail Jaitin
Barnaby Marshall
Frankly, I'd Rather Not
The Shifted Librarian
Ryan Bigge
Nick Denton
Howard Sherman's Nuggets
Serial Deviant
Ellen McDermott
Jeff Liu
Marc Kelsey
Chris Shieh
Iron Monkey
Diversions
Rob Toole
Donut Rock City
Ross Judson
Idle Words
J-Walk Blog
The Antic Muse
Tribblescape
Little Things
Jeff Heer
Abstract Dynamics
Snark Market
Plastic Bag
Sensory Impact
Incoming Signals
MemeFirst
MemoryCard
Majikthise
Ludonauts
Boing Boing
Slashdot
Atrios
Smart Mobs
Plastic
Ludology.org
The Feature
Gizmodo
game girl
Mindjack
Techdirt Wireless News
Corante Gaming blog
Corante Social Software blog
ECHO
SciTech Daily
Arts and Letters Daily
Textually.org
BlogPulse
Robots.net
Alan Reiter's Wireless Data Weblog
Brad DeLong
Viral Marketing Blog
Gameblogs
Slashdot Games