Google has announced that they will now crawl news sites with Googlebot instead of Googlebot-News. For those wondering about Googlebot, it is the same bot that crawls sites for web search.
If you are wondering if you can still block your content from being indexed in Google News; you can opt to do so. David Smydra from Google confirmed the same by commenting,“Google News will still respect the robots.txt entry for Googlebot-News, our former user-agent, if it is more restrictive than the robots.txt entry for Googlebot.”
So, what are the implications of this announcement?
Well, as a Google News publisher if you want Google to index your content in both the web search and the News, then you won't have to do anything. By default Google will continue crawling like it normally does, however your server logs will only see entries for Googlebot as opposed to the previous separate entries for both the Googlebot and Googlebot-News. So basically if you wished for Google News to keep your content, you can continue using the Disallow directive in robots.txt or meta robots tag in order to block the Googlebot-News.
However if you are using data to understand how your website is crawled and make improvements this change is likely to make things much more harder to comprehend.
For instance, if you were to spot that your news writings are not being indexed in the Google News and you check the news-specific crawl errors in the Google Webmaster Tools and fail to notice any issues, apparently you will no longer be able to verify your server logs to see if the writings are being crawled for the news index. Whilst you can only see if the pages are being crawled in general, it does not help in troubleshooting problems on the whole.
Furthermore, if you are generating a news-specific Sitemap but missing the specific URLs, until now you could review your server logs to make sure that Googlebot-News is crawling a particular URLs, and varify the URLs that were not crawled in the Sitemap. However now, the only information the server logs will give you is if Google is crawling the URLs at all. Whilst, whether they are being crawled for web search and not News, that detail is now lost.
Also, with this announcement you will now lose vital insight for web search too. So, if you were looking to tracking down why specific pages on your site are not being indexed, previously you were allowed to review your server logs to see if they were being crawled, but now it will appear as though they are being crawled for Google News.
However it is not all bad news, you can still get News-specific and web-specific crawl errors from Google webmaster tools, so you will certainly get some available insight for checking if your website is being crawled and make improvements. Also the most important reason for this is the ability to block and allow content from Google News separately from web search, and that functionality remains.
Whilst, this in-depth insight was useful for many of us it is indeed unfortunate that Google has now retired the Googlebot- News.Google Announces The Retirement Of The Google-News Bot!,