New Google Process for Detecting Near Duplicate Content!

Feb 28, 2008 | 2,347 views | by Navneet Kaushal
VN:F [1.9.20_1166]
Rating: 0.0/5 (0 votes cast)

For many webmasters, duplicate content always has been a persistent issue. Google even went ahead and got a duplicated content detection patent. However, now Google went even further and has come up with another patent application, developed by Monika H. Henzinger.

This new Google patent application explores how duplicate and near duplicate content might be detected at different web addresses. It further uses some different and existing methods for detecting near duplicate content. With the increasing popularity of blogs and RSS syndication, duplication of copies has also gone many folds high. With the result that now search engine companies including Google are doing their best to cut on the duplication of copies.

The patent research provides citations to a number of documents on the Web that explore the topic of duplicate and near duplicate content, including one of the processes developed by Moses Charikar, a Princeton professor, who is listed as the inventor of a Google patent, granted early last year. It discusses ways to detect similar content on the Web – Methods and apparatus for estimating similarity.

From those documents, Dr. Henzinger tests and explores approaches from each. While there were differences in how effective these approaches were according to tests run, the conclusion about their effectiveness in the patent application was that “neither of the algorithms worked well for finding near-duplicate pairs on the same Website, though both achieved high precision for near-duplicate pairs on different Websites.”

The patent research paper concluded that, “These near-duplicate detection techniques performed well, particularly when analyzing Web pages from the same Website. These techniques did so without sacrificing much in the number of returned correct pairs.”

Though we can not say that with this new patent duplication and near duplication would fully come to an end, but yes, things are going to be better. As they say, “Something is better than nothing.”

4.thumbnail New Google Process for Detecting Near Duplicate Content!

Navneet Kaushal

Navneet Kaushal is the founder and CEO of PageTraffic, an SEO Agency in India with offices in Chicago, Mumbai and London. A leading search strategist, Navneet helps clients maintain an edge in search engines and the online media. Navneet's expertise has established PageTraffic as one of the most awarded and successful search marketing agencies.
4.thumbnail New Google Process for Detecting Near Duplicate Content!
4.thumbnail New Google Process for Detecting Near Duplicate Content!