Google has developed a new patent application that can be used to detect spam documents in a phrase based information retrieval system. The new patent was registered on June 28 and published on December 28 this year, reported at WebmasterWorld Forums.
US Patent and Trademark Office says, “An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.”
The developer of the patent application, Anna Lynn has worked at archive.org before joining Google. She played a major role in information retrieval of nearly 55 billion documents in the index there.