WIRED magazine (one of my favorites) published an in-depth look at Google’s home-grown algorithm in the March 2010 issue. (note: The issue is not online yet but I will include a link to the article when it publishes)
The most telling passages include some insight into the complex and always-changing ways that Google’s massive server cloud interprets human thoughts, desires and intentions:
“We discovered a very nifty thing early on,” [Google search engineer Amit] Singhal says. “People change words in their queries. So someone would say, ‘pictures of dogs,’ and then they’d say, ‘pictures of puppies.’ So that told us that maybe ‘dogs’ and ‘puppies’ were interchangeable. We also learned that when you boil water, it’s hot water.” Google’s synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy.
The search quality team fixed these semantic slip-ups by adding contextual and keyword proximity data from their billions of archived searches.
Although other search engines (Yahoo, Bing, etc.) all claim to have their own advantages, none even come close to Google’s mastery of algorithms:
Singhal led his team on
“a multiyear quest to improve the way the system deals with names – which account for 8 percent of all searches. To crack it, he had to master the black art of ‘bi-gram breakage’ – that is, separating multiple words into discrete units. For instance, ‘new york’ represents two words that go together (a bi-gram). But so would the three words in ‘new york times,’ which clearly indicates a different type of search. And everything changes when the query is ‘new york times square.’ Humans can make these distinctions instantly, but Google does not have a Brazil-like back room with hundreds of thousands of cubicle jockeys. It relies on algorithms.”
Most of these developments are common knowledge and widely reported within the SEO community but to most people it’s still voodoo. The more we know about how pages are ranked, the better we can be at providing the types of content that people and search engines are looking for.
As evidenced by the 550+ updates to the algorithm expected in 2010, they are not done innovating yet. “The holy grail of search is to understand what the user wants,” Singhal says. “Then you are not matching words; you are actually trying to match meaning.”