Jul 28, 2008 113 reads by Navneet Kaushal

You only expect big news from one of the biggest and the most preferred search engine of the world. Yes, we are talking about Google that started out with a great beginning of its indexing procedure where it indexed about 26 million pages in its debutant year in 1998. This beginning, however seems quite modest now as the Official Google Blog has confirmed the achievement of a groundbreaking new milestone – the whooping one trillion mark has been bagged by the industrious Google search engineers. Google has made the web world more useful and exciting for its users by facilitating them with the access to 1 trillion unique URLs on the web, all at once.

Now, there has to be a flip side to the coin. With the web users at ease, the task for Google has become much more arduous and intricate. The way Google goes about indexing these countless pages is rather interesting. The Official Google Blog stated, “We start at a set of well-connected initial pages and follow each of their links to new pages. Then we follow the links on those new pages to even more pages and so on, until we have a huge list of links. In fact, we found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other. Even after removing those exact duplicates, we saw a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.”

With the web expanding infinitely each day, it is both humanly and technologically impossible for Google to scrutinize each of the pages contained in the web to distinguish the unique pages from the ones that are not. It is no hidden fact that one link leads to another and this wild goose chase continues endlessly leading us nowhere. Google understands that and has thus, devised a great strategy to get us users skip this labyrinthine exercise that we fear getting into. The Official Google Blog states, “We don't index every one of those trillion pages — many of them are similar to each other, or represent auto-generated content similar to the calendar example that isn't very useful to searchers. But we're proud to have the most comprehensive index of any search engine, and our goal always has been to index all the world's data.” To get these huge volumes of information to the web users systematically and on time, Google has replaced its archaic techniques with sleek ones, including, downloading the web continuously, assimilating recent page updates and refurbishing the web-ling graph in it's entirety several times everyday.

Google's distributed infrastructure allows it to index every unique page on the web efficiently, ensuring that your Google search each time is significantly better than your previous one.

Navneet Kaushal

Navneet Kaushal

Navneet Kaushal is the founder and CEO of PageTraffic, an SEO Agency in India with offices in Chicago, Mumbai and London. A leading search strategist, Navneet helps clients maintain an edge in search engines and the online media. Navneet's expertise has established PageTraffic as one of the most awarded and successful search marketing agencies.
Navneet Kaushal
Navneet Kaushal
Most popular Posts
Upcoming Events
Events are coming soon, stay tuned!More