Study: Shorter search queries produce better search results


Here’s a study with an interesting finding: If you want to get better results on Google, try using a shorter query.

I found this while doing research for a story about automated “question answering” systems. I was reading through the work of James Allan, a computer scientist at the University of Massachusetts, and read his paper “A Case for Shorter Queries, and Helping Users Create Them” (PDF here). In it, he and his coauthor Giridhar Kumaran conducted an experiment: They took the query Define Argentine and British international relations and ran it through a search engine. (They don’t specify which one they used.) Then they ran various similar queries that used fewer words — “sub queries” — such as define britain international argentina or define britain relate argentina. Each time, he graded the relevance the search engine’s results, expressed as their “average precision” on a scale of zero to 1.0.

So which sub-query produced the best results? The shortest one. It was only two words long — britain argentina — but it scored 0.626, quite a lot better than the original, full-sentence query, which scored only 0.424.

Why would short queries work better than longer ones? Possibly because they contain fewer “noise terms” — common words like define or and — which might muddy the search results. Human language is filled with ambiguity; one of the big challenges for a machine is taking a human question and figuring out what, semantically, it’s actually asking. In that sense, using fewer words would reduce the number of potential ways the machine can misunderstand you.

Except the truly strange thing in that example above is the question was asking about British and Argentinian international relations — yet the best results came from removing the words “international” and “relations”. I’d have expected those to be important words, no? But that’s precisely the point Allan is getting at here:

Sub-queries a human would consider as an incomplete expression of information need sometimes performed better than the original query.

This suggests, of course, that the best way to get results on a search engine is to radically strip your query down even further than you think is useful. Or maybe start with a regular query, and if you don’t like the results, try making it shorter and shorter.

Then again, it’s hard to know if this would really work. I’m not privy to what’s going on behind the hood of most search engines today. Allan’s paper discusses several ways for question-answering systems to have the computer automatically shorten a query before feeding it into the knowledge database; but his paper is a few years old, so maybe these techniques are already common amongst search engines — maybe they already reformat our queries into semantically shorter formats.

What do you guys think? Anecdotally, have you found that super-short queries work better than longer, sentence-like ones?


blog comments powered by Disqus

Search This Site


Bio:

I'm Clive Thompson, a writer on science, technology, and culture. This blog collects bits of offbeat research I'm running into, and musings thereon.

Currently, I'm a contributing writer for the New York Times Magazine and a columnist for Wired magazine. I also write for Fast Company and Wired magazine's web site, among other places. Email or AOL IM me (pomeranian99) to say hi or send in something strange!

More of Me

Twitter
Tumblr
Flickr


Recent Entries

A long German word for “noticing when ads are being customized based on your surfing history”

Gay squid sex

“El Ajedrecista” — an analog chess-playing computer from 1912

Hacking the Model T

“How did you find my site?” and Vannevar Bush’s memex

» visit the Collision Detection archives

Clive Thompson's Tumblr
a bunch of stuff

May 20, 2011 » 02:28 PM

From Christopher Kennedy’s very droll book “Neitzsche’s Horse”.

July 28, 2010 » 07:35 AM
“Wr” - S

July 06, 2010 » 10:05 AM

My Xbox broke, and I was trying to Google some possible technical solutions, when I noticed that Google appears to be encouraging me to make a typo. I suppose it’s possible that Google’s algorithms know that typing “wont” instead of “won’t” would produce better results.

June 29, 2010 » 05:00 PM

On the other hand, when I tried the test for multitasking, I was pretty abysmal. I performed worse than people who identify themselves as heavy multitaskers, and those who identify as low multitaskers.

June 29, 2010 » 04:58 PM

I finally got around to trying out the interactive “test your distractability and multitasking” page at the New York Times, which they put up alongside their story earlier this month about how computer distractions are eroding our lives. 

According to the test, I guess I have good focus — I’m not very distractable! 

» visit my Tumblr

Recent Comments

Photos

» see all of my photos on Flickr

Collision Detection: A Blog by Clive Thompson