Jun 30, 2008 113 reads by Navneet Kaushal

Over at the Official Google Blog, Google explains how the search data it collects helps Google to control and fight webspam.

For the uninitiated, webspam are the unwanted and often junk pages that show up in a search result. This happens due to the unethical methods adopted by certain websites to gain higher rankings in the search engine result pages. Below is the image of a typical page consisted of webspam:

Google Continues Fighting Webspam

Such pages are a nuisance for a user as they provide no relevant or helpful data and are filled with almost no original content and are crowded with irrelevant links. However, as times have changed, so has spam. No more will you find 'right in your face' spam. These days, spam is cleverly disguised and in many cases, it is designed to give a 100% authentic look. Webspam has always been a bane for the internet users, but since Google came out with its own anti-spam methods, webspam has substantially decreased. For most, webspam is a case of simple nuisance where user simply gets bombarded by unwanted results. However, in cases where the searches are critical and sometimes time-oriented, webspam can prove to be a serious hindrance. For example, a search for prostate cancer that's full of spam instead of relevant links greatly diminishes the value of a search engine as a helpful tool.

According to Google, data from search logs is one of the many tools it uses to fight webspam and return cleaner and more relevant results. Logs data such as IP address and cookie information make it possible to create and use metrics that measure the different aspects of Google's search quality (such as index size and coverage, results "freshness," and spam). Before creating any new metric to combat webspam, it's essential to be able to go over the logs data and compute new spam metrics using previous queries or results. Google also reflects upon the search logs for queries ran months ago, to compare the search engine's performance. When a metric is created to attack webspam, Google makes it a point to monitor present as well as the past performance and progress of the metric.

One of the most important components for any new metric is the IP and cookie information, as it helps Google to apply the metrics only to searches that are from legitimate users as opposed to those that were generated by bots and other false searches. For example, if a bot sends the same queries to Google over and over again, those queries should really be discarded before Google measures how much spam the users see. All of this log data, IP addresses, and cookie information make the search results cleaner and more relevant.

In the year 2007, Google faced a flurry of webspam on Chinese domains in its index. In this scenario, spammers bought numerous inexpensive .cn domains and stuffed them with misspellings and explicit phrases. However, everyday user wasn't affected by this issue, as Google immediately took the necessary steps to counterattack the emerging spam attack. With the help of logs data, Google is able to create metrics and procedures to reduce the webspam, so that a regular user does not have to go through sneaky JavaScript redirects, unwanted porn, gibberish-stuffed pages or other types of webspam. Google's logs data helps ensure that it detects and has a chance to counteract new spam trends before it lowers the quality of user search experience.

Navneet Kaushal

Navneet Kaushal

Navneet Kaushal is the founder and CEO of PageTraffic, an SEO Agency in India with offices in Chicago, Mumbai and London. A leading search strategist, Navneet helps clients maintain an edge in search engines and the online media. Navneet's expertise has established PageTraffic as one of the most awarded and successful search marketing agencies.
Navneet Kaushal
Navneet Kaushal
Most popular Posts
Upcoming Events
Events are coming soon, stay tuned!More