Fighting Spam section of How Search Works , a new website launched by Google, shares some real examples of ‘pure spam’ pages that have been removed from search results. It may give you an insight on how their spam algorithm works. (For more information on How Search Works, refer to our March 2 publish.)
Google equates the web with an ever-growing library with billions of books and no central filing system. So they have put up a blend of algorithmic and manual reviews to maintain quality of their search results and fight spammy techniques such as keyword stuffing, invisible text, paid links, etc. Millions of web pages are created every day, so it is always a challenge for the system to identify and remove spammy pages.
The pages removed by Google use aggressive spam techniques such as scraping content from other websites, cloaking and automatically generated gibberish. The screenshots are generated automatically.
Here’re a few recent screenshots Google has published:
Google has also prepared a chart of manual anti-spam actions by category. They say only 0.22% of domains were manually marked for removal. They also came up with charts showing reconsideration requests and webmaster notifications.
Here’s is a chart showing manual actions by category: