Jun 6, 2007 115 reads by Navneet Kaushal

Recently Yahoo! announced about the transition of their web crawler, Yahoo! Slurp. The transition is now complete and Yahoo! Slurp has been placed to a new domain crawl.yahoo.net. Of now, all machines crawling as Slurp are in crawl.yahoo.net. The IP addresses and the user-agent remain the same. Only the lookup of the DNS is different.

At Yahoo's web server logs, the pages that were accessed from inktomisearch.com have been totally replaced by crawl.yahoo.net contacts only for Slurp. This transformation also implies to other Yahoo! Crawlers like Yahoo! China and other verticals like Yahoo! Shopping, Yahoo! Travel, etc., that have their own user-agents.

Clients whose pages have the robot.txt file does not have to change as the crawler user-agent is still Yahoo! Slurp. Moreover, there are no changes in the IP addresses from which the crawler works.

Nevertheless clients have to check that their network or firewall setup does not keep crawl.yahoo.net.

With the setup of reverse DNS-based authentication of the crawler, users can ensure that no rogue bots masquerading as 'Slurp' visit their site. As described by the Yahoo! Experts, the process is very simple:

  1. For each page view request, check the user-agent and IP address. All requests from Yahoo! Search utilize a user-agent starting with 'Yahoo! Slurp.'
  2. For each request from 'Yahoo! Slurp' user-agent, you can start with the IP address (i.e. and use reverse DNS lookup to find out the registered name of the machine.
  3. Once you have the host name (in this case, lj612134.crawl.yahoo.net), you can then check if it really is coming from Yahoo! Search. The name of all Yahoo! Search crawlers will end with 'crawl.yahoo.net,' so if the name doesn't end with this, you know it's not really our crawler.
  4. Finally, you need to verify the name is accurate. In order to do this, you can use Forward DNS to see the IP address associated with the host name. This should match the IP address you used in Step 2. If it doesn't, it means the name was fake.

If a fake DNS signature that you know is not 'Yahoo! Slurp' calling, you can manage access to your content appropriately. By simply returning an HTTP Error, you can block people from seeing your content.”

Navneet Kaushal

Navneet Kaushal

Navneet Kaushal is the founder and CEO of PageTraffic, an SEO Agency in India with offices in Chicago, Mumbai and London. A leading search strategist, Navneet helps clients maintain an edge in search engines and the online media. Navneet's expertise has established PageTraffic as one of the most awarded and successful search marketing agencies.
Navneet Kaushal
Navneet Kaushal
Most popular Posts
Upcoming Events
Events are coming soon, stay tuned!More