There has been hectic debate on the Google's latest patent application called as 'Detecting spam documents in a phrase based information retrieval system', which is the sixth published patent of Anna Patterson.
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.
A WebmasterWorld Forum thread says that, “This approach seems to me to be aimed at auto generated pages, constructed from scraped bits and pieces to attract a long tail search to a page with ads. Of course, it does all hang on the base measures of assumed non-spam documents, but I assume Google has enough data to take a decent baseline measure.”