Slurp! Yahoo's Latest Search Crawler!

Jun 6, 2007 | 2,991 views | by Navneet Kaushal
VN:F [1.9.13_1145]
Rating: 0.0/5 (0 votes cast)

Recently Yahoo! announced about the transition of their web crawler, Yahoo! Slurp. The transition is now complete and Yahoo! Slurp has been placed to a new domain crawl.yahoo.net. Of now, all machines crawling as Slurp are in crawl.yahoo.net. The IP addresses and the user-agent remain the same. Only the lookup of the DNS is different.

At Yahoo's web server logs, the pages that were accessed from inktomisearch.com have been totally replaced by crawl.yahoo.net contacts only for Slurp. This transformation also implies to other Yahoo! Crawlers like Yahoo! China and other verticals like Yahoo! Shopping, Yahoo! Travel, etc., that have their own user-agents.

Clients whose pages have the robot.txt file does not have to change as the crawler user-agent is still Yahoo! Slurp. Moreover, there are no changes in the IP addresses from which the crawler works.

Nevertheless clients have to check that their network or firewall setup does not keep crawl.yahoo.net.

With the setup of reverse DNS-based authentication of the crawler, users can ensure that no rogue bots masquerading as 'Slurp' visit their site. As described by the Yahoo! Experts, the process is very simple:

  1. For each page view request, check the user-agent and IP address. All requests from Yahoo! Search utilize a user-agent starting with 'Yahoo! Slurp.'
  2. For each request from 'Yahoo! Slurp' user-agent, you can start with the IP address (i.e. 74.6.67.218) and use reverse DNS lookup to find out the registered name of the machine.
  3. Once you have the host name (in this case, lj612134.crawl.yahoo.net), you can then check if it really is coming from Yahoo! Search. The name of all Yahoo! Search crawlers will end with 'crawl.yahoo.net,' so if the name doesn't end with this, you know it's not really our crawler.
  4. Finally, you need to verify the name is accurate. In order to do this, you can use Forward DNS to see the IP address associated with the host name. This should match the IP address you used in Step 2. If it doesn't, it means the name was fake.

If a fake DNS signature that you know is not 'Yahoo! Slurp' calling, you can manage access to your content appropriately. By simply returning an HTTP Error, you can block people from seeing your content.”

Recommend this story

Navneet Kaushal

About the author:

Navneet Kaushal, CEO PageTraffic is a trusted authority in the search engine marketing industry. He is a featured author at Web Pro News, Search Newz, Promotionworld, Website Notes, DevWebPro, SEO Article and Web Help Now among many others. Follow Navneet Kaushal on Google +.

Related Articles

  • No Related Post

Leave a Comment