Here’s a study with an interesting finding: If you want to get better results on Google, try using a shorter query.
I found this while doing research for a story about automated “question answering” systems. I was reading through the work of James Allan, a computer scientist at the University of Massachusetts, and read his paper “A Case for Shorter Queries, and Helping Users Create Them” (PDF here). In it, he and his coauthor Giridhar Kumaran conducted an experiment: They took the query Define Argentine and British international relations and ran it through a search engine. (They don’t specify which one they used.) Then they ran various similar queries that used fewer words — “sub queries” — such as define britain international argentina or define britain relate argentina. Each time, he graded the relevance the search engine’s results, expressed as their “average precision” on a scale of zero to 1.0.
So which sub-query produced the best results? The shortest one. It was only two words long — britain argentina — but it scored 0.626, quite a lot better than the original, full-sentence query, which scored only 0.424.
Why would short queries work better than longer ones? Possibly because they contain fewer “noise terms” — common words like define or and — which might muddy the search results. Human language is filled with ambiguity; one of the big challenges for a machine is taking a human question and figuring out what, semantically, it’s actually asking. In that sense, using fewer words would reduce the number of potential ways the machine can misunderstand you.
Except the truly strange thing in that example above is the question was asking about British and Argentinian international relations — yet the best results came from removing the words “international” and “relations”. I’d have expected those to be important words, no? But that’s precisely the point Allan is getting at here:
Sub-queries a human would consider as an incomplete expression of information need sometimes performed better than the original query.
This suggests, of course, that the best way to get results on a search engine is to radically strip your query down even further than you think is useful. Or maybe start with a regular query, and if you don’t like the results, try making it shorter and shorter.
Then again, it’s hard to know if this would really work. I’m not privy to what’s going on behind the hood of most search engines today. Allan’s paper discusses several ways for question-answering systems to have the computer automatically shorten a query before feeding it into the knowledge database; but his paper is a few years old, so maybe these techniques are already common amongst search engines — maybe they already reformat our queries into semantically shorter formats.
What do you guys think? Anecdotally, have you found that super-short queries work better than longer, sentence-like ones?
I'm Clive Thompson, the author of Smarter Than You Think: How Technology is Changing Our Minds for the Better (Penguin Press). You can order the book now at Amazon, Barnes and Noble, Powells, Indiebound, or through your local bookstore! I'm also a contributing writer for the New York Times Magazine and a columnist for Wired magazine. Email is here or ping me via the antiquated form of AOL IM (pomeranian99).
El Rey Del Art
Frankly, I'd Rather Not
The Shifted Librarian
Howard Sherman's Nuggets
Donut Rock City
The Antic Muse
Techdirt Wireless News
Corante Gaming blog
Corante Social Software blog
Arts and Letters Daily
Alan Reiter's Wireless Data Weblog
Viral Marketing Blog