“The” — a secret code to unlocking Google?

A couple of days ago, I posted about a neat Google hack — the search results for “weapons of mass destruction”. In the comment field for the item, Franco pointed out that when he recently tried to seach for the goddess “Tykhe”, Google asked him if he really meant to search for the word “the”. As Franco sardonically joked: “Yes, I meant to search the entire internet for the word ‘the’ — a word which you refuse to search for.” And it’s true: Whenever you type in a search string with common words like “the” or “and”, Google strips them out. Generally, Google won’t even allow you to include “the” as a search term.

But here’s the weird thing: If you type in only the word “the” as a search, you actually do get results. When I searched for “Tykhe”, Google gave me the same response it gave Franco:

Searched the web for Tykhe — Results 1 - 10 of about 302. Search took 0.05 seconds.

Did you mean: The

So I clicked on the “the” search, and discovered it generates 3,680,000,000 results. The top-ranked search results are, in order:

The Onion
The White House
The Economist
NASA
The Guardian
AllTheWeb.com
The Weather Channel
The New York Times
The Washington Post
The Hunger Site

This is really intriguing. Since “the” is the most common word in the English language, it would — theoretically — be distributed pretty evenly around the Internet. In that case, when Google searches for “the”, it faces a unique situation. It would be very hard for Google’s semantic or key-word-matching tools to figure out which web site used the word most frequently, or in a most significant fashion. Most semantic or key-word-matching reasoning is rendered useless. And indeed, look again at the number of results: 3,680,000,000. That’s almost precisely the number of sites that Google claims to index — 3,083,324,652. Thus, the search “the” is returning results for every single page on the Internet.

In this situation, the main trick Google has to fall back on is PageRank: Its patented system for determining which sites are important, by counting the number of links that point to them. This would mean, then that The Onion — and those other nine sites — may have more links to it than most other sites on the Net. They are, in effect, the most popular sites on the Net, since PageRank popularity is clearly the main criteria — if not the only criteria — that Google is using to place them on the Top 10 list, right?

Well, maybe. Possibly the names of the sites are important, too. Notice that, except for NASA, all the sites have the word “the” in their official web-site title — and thus probably also in their meta tags, and various other semantically important bits of HTML. That may explain why The Hunger Site appears so high.

Pretty weird, eh?


blog comments powered by Disqus

Search This Site


Bio:

I'm Clive Thompson, the author of Smarter Than You Think: How Technology is Changing Our Minds for the Better (Penguin Press). You can order the book now at Amazon, Barnes and Noble, Powells, Indiebound, or through your local bookstore! I'm also a contributing writer for the New York Times Magazine and a columnist for Wired magazine. Email is here or ping me via the antiquated form of AOL IM (pomeranian99).

More of Me

Twitter
Tumblr

Recent Comments

Collision Detection: A Blog by Clive Thompson