Tips To Optimize Crawling and Indexing

Aug 11, 2009 | 1,382 views | by Navneet Kaushal
VN:F [1.9.13_1145]
Rating: 0.0/5 (0 votes cast)

Issues like website architecture, crawling and indexing, as well as ranking issues always revolve around one central issue, i.e. 'How easy is it for search engines to crawl your site?' Google Webmaster Central Blog had discussed this topic many times and once again it has come up with a presentation and some key points to be considered: Here is the slideshow:

Every time new content is being created and uploaded on the Internet. But, with limited number of resources Googlebot can only find and crawl a definite percentage of content, out of the infinite number of content available online. And only a portion of the crawled content is then indexed by Google. Then comes the URLs. Well, URLs can be called as the bridges between a website and a search engine's crawler. Crawlers should be able to find and crawl the URLs in order to get to a site's content. Now, if a site's URLs are complicated or they have excessive words, search engine crawlers tend to spend more time tracing and retracing similar steps, but if the URLs are organized and lead directly to a particular content, then crawlers can find more time to access the content rather than crawling through empty pages, or crawling the same content over and over via different URLs. In the slides above, some examples of what not to do in this regard are given. Below are some recommendations regarding the complicated issue of URL crawling. By considering these you can help crawlers find your website's content faster. They are:

  • Remove user-specific details from URLs: Consider removing URL parameters like session IDs or sort order from the URL and put them into a cookie. Putting this information in a cookie and 301 redirecting to a “clean” URL can help you retain the information and at the same time help reduce the number of URLs pointing to that same content.
  • Rein in infinite spaces: If a website boasts a calendar that links to an infinite number of past or future dates (each with their own unique URL) or if it has paginated data that returns a status code of 200 when &page=3563 to the URL is being added, even if there aren't that many pages of data, this may be indication of the presence of an infinite crawl space on the website. In this case, crawlers could be wasting their bandwidth trying to crawl it all. These tips will help you know how to rein in infinite crawl spaces.
  • Disallow actions Googlebot can't perform:Using robots.txt file , one can disallow crawling of login pages, contact forms, shopping carts and other pages whose sole functionality is something that a crawler can't perform. (Crawlers are notoriously cheap and shy, so they don't usually "Add to cart" or "Contact us.") This will allow crawlers to spend more of their time crawling content.
  • One URL, one set of content: There should be one URL that leads to a unique piece of content or each piece of content can only be accessed via one URL. The one-to-one pairing between URL and content can help streamline a site for effective crawling and indexing. However, if your CMS or current site setup makes this difficult, you can always use the rel=canonical element to indicate the preferred URL for a particular piece of content.

For more information on optimizing a site for crawling and indexing, visit Webmaster Help Forum.

Recommend this story

Navneet Kaushal

About the author:

Navneet Kaushal, CEO PageTraffic is a trusted authority in the search engine marketing industry. He is a featured author at Web Pro News, Search Newz, Website Notes, DevWebPro, SEO Article and Web Help Now among many others.

Related Articles

  • No Related Post

{ 4 comments… read them below or add one }

agnes August 12, 2009 at 09:34

Thanks for providing the very great tips.Great information.Really Nice.

Reply

Donna G. Fraley - pass drug test August 14, 2009 at 19:12

Thanks for those tips. I was not aware of using the tip of robots.txt file. That seems to be very helpful. thanks again

Reply

seo company San Diego August 17, 2009 at 04:41

Very nice write up and very well explained. Removing user-specific details from URLs is very effective and very helpful. BTW, nice slide. Thanks for sharing this information. More power.

Richard

Reply

lee September 11, 2009 at 21:27

hi i have been waiting for google and bing to index my blog http://www.diyanswerdirect.com/blog for two weeks now and am getting very frustrated with them. aswell as iam still waiting for them to update my site http://www.diyanswerdirect.com. My question is when do they update and how can i get them to crawl or spider my site.

Reply

Leave a Comment