Study: Shorter search queries produce better search results


Here’s a study with an interesting finding: If you want to get better results on Google, try using a shorter query.

I found this while doing research for a story about automated “question answering” systems. I was reading through the work of James Allan, a computer scientist at the University of Massachusetts, and read his paper “A Case for Shorter Queries, and Helping Users Create Them” (PDF here). In it, he and his coauthor Giridhar Kumaran conducted an experiment: They took the query Define Argentine and British international relations and ran it through a search engine. (They don’t specify which one they used.) Then they ran various similar queries that used fewer words — “sub queries” — such as define britain international argentina or define britain relate argentina. Each time, he graded the relevance the search engine’s results, expressed as their “average precision” on a scale of zero to 1.0.

So which sub-query produced the best results? The shortest one. It was only two words long — britain argentina — but it scored 0.626, quite a lot better than the original, full-sentence query, which scored only 0.424.

Why would short queries work better than longer ones? Possibly because they contain fewer “noise terms” — common words like define or and — which might muddy the search results. Human language is filled with ambiguity; one of the big challenges for a machine is taking a human question and figuring out what, semantically, it’s actually asking. In that sense, using fewer words would reduce the number of potential ways the machine can misunderstand you.

Except the truly strange thing in that example above is the question was asking about British and Argentinian international relations — yet the best results came from removing the words “international” and “relations”. I’d have expected those to be important words, no? But that’s precisely the point Allan is getting at here:

Sub-queries a human would consider as an incomplete expression of information need sometimes performed better than the original query.

This suggests, of course, that the best way to get results on a search engine is to radically strip your query down even further than you think is useful. Or maybe start with a regular query, and if you don’t like the results, try making it shorter and shorter.

Then again, it’s hard to know if this would really work. I’m not privy to what’s going on behind the hood of most search engines today. Allan’s paper discusses several ways for question-answering systems to have the computer automatically shorten a query before feeding it into the knowledge database; but his paper is a few years old, so maybe these techniques are already common amongst search engines — maybe they already reformat our queries into semantically shorter formats.

What do you guys think? Anecdotally, have you found that super-short queries work better than longer, sentence-like ones?


blog comments powered by Disqus

Search This Site


Bio:

I'm Clive Thompson, a writer on science, technology, and culture. This blog collects bits of offbeat research I'm running into, and musings thereon.

Currently, I'm a contributing writer for the New York Times Magazine and a columnist for Wired magazine. I also write for Fast Company and Wired magazine's web site, among other places. Email or AOL IM me (pomeranian99) to say hi or send in something strange!

More of Me

Twitter
Tumblr
Flickr


Recent Entries

The “Milky Way Transit Authority” map

Should automobile software be open-sourced?

My Bookforum review of Jaron Lanier’s “You Are Not A Gadget”

Molecular secrets of the “iron-plated snail”

Garry Kasparov, cyborg

» visit the Collision Detection archives

Clive Thompson's Tumblr
a bunch of stuff

January 31, 2010 » 07:29 PM
V. A. To me death seems to be an evil.
M. What, to those who are al­ready dead? or to those who must die?
A. To both.
M. It is a mis­ery, then, be­cause an evil?
A. Cer­tain­ly.
M. Then those who have al­ready died, and those who have still got to die, are both mis­er­able?
A. So it ap­pears to me.
M. Then all are mis­er­able?
A. Ev­ery one.

January 24, 2010 » 03:22 PM

One of the more interesting trends is family, which came in at number five. Specifically, discussion about family, moms, dads, daughters, etc. jumped during 2009. With Facebook users getting older, this isn’t a big surprise. However, the fact that the mention of “kids” jumped by a factor of five this year is rather dramatic. It’s tough to know what this means, though. (via Facebook Unveils Most-Mentioned Topics of 2009

)

January 15, 2010 » 01:36 PM

BEYOND AWESOME. They are announcing a recall of the Plush Uterus “due to a potential choking hazard for children”. To apply for it, “Please send an email to the address below with the subject line, ‘UTERUS OPT OUT’”.

January 14, 2010 » 10:04 PM

“To order, please TYPE “YES” IN CHECKBOX BELOW TO AGREE YOU UNDERSTAND THIS PLUSH MUST BE KEPT AWAY FROM KIDS (it is a sex organ, after all). If it is not checked, WE WILL NOT SEND THE UTERUS.” (via @ibogost)

January 11, 2010 » 01:45 PM

I watched Space: 1999 back in the day, but I swear to god I do not remember this scene.

» visit my Tumblr

Recent Comments

Photos

» see all of my photos on Flickr

Collision Detection: A Blog by Clive Thompson