Over at the Official Google Blog, Google explains how the search data it collects helps Google to control and fight webspam.
For the uninitiated, webspam are the unwanted and often junk pages that show up in a search result. This happens due to the unethical methods adopted by certain websites to gain higher rankings in the search engine result pages. Below is the image of a typical page consisted of webspam:
Such pages are a nuisance for a user as they provide no relevant or helpful data and are filled with almost no original content and are crowded with irrelevant links. However, as times have changed, so has spam. No more will you find 'right in your face' spam. These days, spam is cleverly disguised and in many cases, it is designed to give a 100% authentic look. Webspam has always been a bane for the internet users, but since Google came out with its own anti-spam methods, webspam has substantially decreased. For most, webspam is a case of simple nuisance where user simply gets bombarded by unwanted results. However, in cases where the searches are critical and sometimes time-oriented, webspam can prove to be a serious hindrance. For example, a search for prostate cancer that's full of spam instead of relevant links greatly diminishes the value of a search engine as a helpful tool.
According to Google, data from search logs is one of the many tools it uses to fight webspam and return cleaner and more relevant results. Logs data such as IP address and cookie information make it possible to create and use metrics that measure the different aspects of Google's search quality (such as index size and coverage, results "freshness," and spam). Before creating any new metric to combat webspam, it's essential to be able to go over the logs data and compute new spam metrics using previous queries or results. Google also reflects upon the search logs for queries ran months ago, to compare the search engine's performance. When a metric is created to attack webspam, Google makes it a point to monitor present as well as the past performance and progress of the metric.
One of the most important components for any new metric is the IP and cookie information, as it helps Google to apply the metrics only to searches that are from legitimate users as opposed to those that were generated by bots and other false searches. For example, if a bot sends the same queries to Google over and over again, those queries should really be discarded before Google measures how much spam the users see. All of this log data, IP addresses, and cookie information make the search results cleaner and more relevant.