Bing has revealed in a Bing Search Quality Insights blog how the search engine cleans its search results of junk links. This is another information insight in the new series started by Bing to provide users with search quality insights and the workings of Bing.
As Bing is quoted “It is annoying when you end up on a page for a domain that was just registered and is plastered with ads without any useful content. These are different types of junk links that we refer to as dead links, soft 404s, and parked domains. Let's explore how Bing detects and removes these sites from the search results.”
The search engine has given an in depth view of how they handle removing bad links and empty snippets from the search results.
Junk links have been categorized as:
1.) Dead Links – Dead link is the one when the pages return a 4xx or 5xx error code from an HTTP request of the page. If Bing has not crawled the page for some time, it will be unaware of any dead links there. But the crawler often detects the page quickly and then Bing takes the required action. Quoting Bing, “If we think there is some suspicion about the page in question we may boost its re-crawl priority and frequency to help us make a determination as quickly as possible. There is an important tradeoff we make here between aggressively removing dead links and the relevance of our search result is returned from an HTTP request for a page.”
2.) Soft 404– In case of a Soft 404, the server returns a normal HTTP 200 code with a webpage that informs the searcher of the non-existence of the original page on the site anymore. In this case, Bing tries to estimate whether it is a 404 or not. Quoting Bing, “Our high precision classifiers in this area use page content such as key phrases in the page’s title, body and URL to determine if the page is a soft 404; and whether to remove it from the search results.”
3.) Parked Domains– The sites that have placeholder content after a new domain registration are parked domains. These pages monetize traffic with ads directing to the domain before it has been properly setup by the new site owner. Bing says, “Like the techniques used to identify soft 404s, we look at the patterns in page content to determine if a page is a parked domain. By collaboratively mining many different types of pages against our large index of web data we are able to create signatures that allow us to identify parked domains when we see them and to remove them from our search results.”
Junk or Empty Snippets include:
- Junky Snippets – With the use of techniques to improve their encoding classifier, document convertor, garbage detector, and HTML parser, the search engine has been able to reduce junky snippets.
- Empty Snippets – In the case of empty snippets, Bing uses dynamic crawlers and document processors. Also with a number of classifiers Bing knows the right snippet for search results.
What do you think of this insight of Bing’s working? Do share your views.