"The" -- a secret code to unlocking Google?
A couple of days ago, I posted about a neat Google hack -- the search results for "weapons of mass destruction". In the comment field for the item, Franco pointed out that when he recently tried to seach for the goddess "Tykhe", Google asked him if he really meant to search for the word "the". As Franco sardonically joked: "Yes, I meant to search the entire internet for the word 'the' -- a word which you refuse to search for." And it's true: Whenever you type in a search string with common words like "the" or "and", Google strips them out. Generally, Google won't even allow you to include "the" as a search term.
But here's the weird thing: If you type in only the word "the" as a search, you actually do get results. When I searched for "Tykhe", Google gave me the same response it gave Franco:
Searched the web for Tykhe -- Results 1 - 10 of about 302. Search took 0.05 seconds.
Did you mean: The
So I clicked on the "the" search, and discovered it generates 3,680,000,000 results. The top-ranked search results are, in order:
The Onion
The White House
The Economist
NASA
The Guardian
AllTheWeb.com
The Weather Channel
The New York Times
The Washington Post
The Hunger Site
This is really intriguing. Since "the" is the most common word in the English language, it would -- theoretically -- be distributed pretty evenly around the Internet. In that case, when Google searches for "the", it faces a unique situation. It would be very hard for Google's semantic or key-word-matching tools to figure out which web site used the word most frequently, or in a most significant fashion. Most semantic or key-word-matching reasoning is rendered useless. And indeed, look again at the number of results: 3,680,000,000. That's almost precisely the number of sites that Google claims to index -- 3,083,324,652. Thus, the search "the" is returning results for every single page on the Internet.
In this situation, the main trick Google has to fall back on is PageRank: Its patented system for determining which sites are important, by counting the number of links that point to them. This would mean, then that The Onion -- and those other nine sites -- may have more links to it than most other sites on the Net. They are, in effect, the most popular sites on the Net, since PageRank popularity is clearly the main criteria -- if not the only criteria -- that Google is using to place them on the Top 10 list, right?
Well, maybe. Possibly the names of the sites are important, too. Notice that, except for NASA, all the sites have the word "the" in their official web-site title -- and thus probably also in their meta tags, and various other semantically important bits of HTML. That may explain why The Hunger Site appears so high.
Pretty weird, eh?
Posted by Clive Thompson at July 07, 2003 11:01 PM
Trackback Pings
TrackBack URL for this entry: http://www.collisiondetection.net/mt3/mt-tb.cgi/424
As I discovered a while back, if you Google on just the letter "s" you get www.gnu.org as the top result. My theory is that it is because of the "'s" in their slogan, "GNU's Not Unix."
Posted by: Tom at July 8, 2003 10:49 AM
Posted by: Clive at July 14, 2003 1:34 PM
Also, you can include a plus sign in front of any word google removes to force it to stay there. eg. "+the onion" or "+this +or +that"
Posted by: RicMoo at October 11, 2003 6:29 PM
Posted by: Clive at October 13, 2003 11:14 PM
Where can I find more information about this ?
Posted by: Swinging Couples at January 11, 2004 11:21 AM
Posted by: Online Casino at January 16, 2004 2:50 AM
To address this issue, we turn to the second place to put variables, which is called the Heap. If you think of the Stack as a high-rise apartment building somewhere, variables as tenets and each level building atop the one before it, then the Heap is the suburban sprawl, every citizen finding a space for herself, each lot a different size and locations that can't be readily predictable. For all the simplicity offered by the Stack, the Heap seems positively chaotic, but the reality is that each just obeys its own rules.
Posted by: Cassandra at January 19, 2004 6:51 PM
Each Stack Frame represents a function. The bottom frame is always the main function, and the frames above it are the other functions that main calls. At any given time, the stack can show you the path your code has taken to get to where it is. The top frame represents the function the code is currently executing, and the frame below it is the function that called the current function, and the frame below that represents the function that called the function that called the current function, and so on all the way down to main, which is the starting point of any C program.
Posted by: Edith at January 19, 2004 6:51 PM
The rest of our conversion follows a similar vein. Instead of going through line by line, let's just compare end results: when the transition is complete, the code that used to read:
Posted by: Prudence at January 19, 2004 6:51 PM
This is another function provided for dealing with the heap. After you've created some space in the Heap, it's yours until you let go of it. When your program is done using it, you have to explicitly tell the computer that you don't need it anymore or the computer will save it for your future use (or until your program quits, when it knows you won't be needing the memory anymore). The call to simply tells the computer that you had this space, but you're done and the memory can be freed for use by something else later on.
Posted by: Bellingham at January 19, 2004 6:51 PM
Earlier I mentioned that variables can live in two different places. We're going to examine these two places one at a time, and we're going to start on the more familiar ground, which is called the Stack. Understanding the stack helps us understand the way programs run, and also helps us understand scope a little better.
Posted by: Cornelius at January 19, 2004 6:52 PM
But variables get one benefit people do not
Posted by: Elizabeth at January 19, 2004 6:52 PM
Seth Roby graduated in May of 2003 with a double major in English and Computer Science, the Macintosh part of a three-person Macintosh, Linux, and Windows graduating triumvirate.
Posted by: Prospero at January 19, 2004 6:52 PM
Seth Roby graduated in May of 2003 with a double major in English and Computer Science, the Macintosh part of a three-person Macintosh, Linux, and Windows graduating triumvirate.
Posted by: Quivier at January 19, 2004 6:52 PM
Let's see an example by converting our favoriteNumber variable from a stack variable to a heap variable. The first thing we'll do is find the project we've been working on and open it up in Project Builder. In the file, we'll start right at the top and work our way down. Under the line:
Posted by: Polidore at January 19, 2004 6:52 PM
The rest of our conversion follows a similar vein. Instead of going through line by line, let's just compare end results: when the transition is complete, the code that used to read:
Posted by: Jenkin at January 19, 2004 6:52 PM
Posted by: julia at January 24, 2004 6:55 PM
Post a comment
As I discovered a while back, if you Google on just the letter "s" you get www.gnu.org as the top result. My theory is that it is because of the "'s" in their slogan, "GNU's Not Unix."
Posted by: Tom at July 8, 2003 10:49 AM
Heh.
Posted by: Clive at July 14, 2003 1:34 PM
Also, you can include a plus sign in front of any word google removes to force it to stay there. eg. "+the onion" or "+this +or +that"
Posted by: RicMoo at October 11, 2003 6:29 PM
Oh, that's cool!
Posted by: Clive at October 13, 2003 11:14 PM
Where can I find more information about this ?
Posted by: Swinging Couples at January 11, 2004 11:21 AM
Nice site. thx.
Posted by: Online Casino at January 16, 2004 2:50 AM
To address this issue, we turn to the second place to put variables, which is called the Heap. If you think of the Stack as a high-rise apartment building somewhere, variables as tenets and each level building atop the one before it, then the Heap is the suburban sprawl, every citizen finding a space for herself, each lot a different size and locations that can't be readily predictable. For all the simplicity offered by the Stack, the Heap seems positively chaotic, but the reality is that each just obeys its own rules.
Posted by: Cassandra at January 19, 2004 6:51 PM
Each Stack Frame represents a function. The bottom frame is always the main function, and the frames above it are the other functions that main calls. At any given time, the stack can show you the path your code has taken to get to where it is. The top frame represents the function the code is currently executing, and the frame below it is the function that called the current function, and the frame below that represents the function that called the function that called the current function, and so on all the way down to main, which is the starting point of any C program.
Posted by: Edith at January 19, 2004 6:51 PM
The rest of our conversion follows a similar vein. Instead of going through line by line, let's just compare end results: when the transition is complete, the code that used to read:
Posted by: Prudence at January 19, 2004 6:51 PM
This is another function provided for dealing with the heap. After you've created some space in the Heap, it's yours until you let go of it. When your program is done using it, you have to explicitly tell the computer that you don't need it anymore or the computer will save it for your future use (or until your program quits, when it knows you won't be needing the memory anymore). The call to simply tells the computer that you had this space, but you're done and the memory can be freed for use by something else later on.
Posted by: Bellingham at January 19, 2004 6:51 PM
Earlier I mentioned that variables can live in two different places. We're going to examine these two places one at a time, and we're going to start on the more familiar ground, which is called the Stack. Understanding the stack helps us understand the way programs run, and also helps us understand scope a little better.
Posted by: Cornelius at January 19, 2004 6:52 PM
But variables get one benefit people do not
Posted by: Elizabeth at January 19, 2004 6:52 PM
Seth Roby graduated in May of 2003 with a double major in English and Computer Science, the Macintosh part of a three-person Macintosh, Linux, and Windows graduating triumvirate.
Posted by: Prospero at January 19, 2004 6:52 PM
Seth Roby graduated in May of 2003 with a double major in English and Computer Science, the Macintosh part of a three-person Macintosh, Linux, and Windows graduating triumvirate.
Posted by: Quivier at January 19, 2004 6:52 PM
Let's see an example by converting our favoriteNumber variable from a stack variable to a heap variable. The first thing we'll do is find the project we've been working on and open it up in Project Builder. In the file, we'll start right at the top and work our way down. Under the line:
Posted by: Polidore at January 19, 2004 6:52 PM
The rest of our conversion follows a similar vein. Instead of going through line by line, let's just compare end results: when the transition is complete, the code that used to read:
Posted by: Jenkin at January 19, 2004 6:52 PM
Posted by: julia at January 24, 2004 6:55 PM