Feb 13, 2008

Google's Webmaster Trends Analyst has compiled a list of must read articles from the archives. Most of these posts are the ones to which people are constantly referred to. While some are recent, some are much older.

Googlebot can't access my website!

While its a good sign that Web hosters get more aggressive about blocking spam bots and aggressive crawlers from their servers, but sometimes they also block Googlebot without knowing it. It is quite possible that if you or your hosters are "allowing" Googlebot through by whitelisting Googlebot IP addresses, you could still be blocking some of Google's IPs without knowing it, as the full IP list isn't public for various reasons. In order to ensure that you're allowing Googlebot access to your site, you can use the method in this post to verify whether a crawler is Googlebot.

URL blocked by robots.txt

Sometimes the web crawl section of Webmaster Tools reports a URL as "blocked by robots.txt", but your robots.txt file doesn't appear to have blocked crawling of that URL. When faced with such a situation you need to check out the list of troubleshooting tips, especially the part about redirects in this post. Some reasons for discrepancies between web crawl error reports and the robots.txt analysis tool have explained this post as well.

Why was my URL removal request denied?

The process to remove a URL from Google search results entails that you first put something in place that will prevent Googlebot from simply picking that URL up again the next time it crawls your site. This may be a 404 (or 410) status code, a noindex meta tag, or a robots.txt file, depending on what type of removal request you're submitting. This article offers directions to help you with it.

Flash best practices

Flash is a hot topic for webmasters interested in making visually complex content accessible to search engines. In this post the best practices for working with Flash have been outlined.

The supplemental index

The supplemental index was a hot topic of conversation in 2007, and some webmasters might still be worried about it. This post tells you everything about how Google now searches its entire index for every query.

Duplicate content

Duplicate content is also a matter of perennial concern of webmasters. This post talks in detail about duplicate content caused by URL parameters, with a reference to another post on deftly dealing with duplicate content. In this post you can expect, lots of good suggestions on how to avoid or mitigate problems caused by duplicate content.

Sitemaps FAQs

This post answers the most frequent questions Google gets about Sitemaps. Realizing the need for it. Realizing the need for it, we had covered this post on our blog as well, see: Google's FAQs on Sitemaps.

