Google Tend To Crawl & Index HTML Forms!

Apr 14, 2008 | 2,791 views | by Navneet Kaushal
VN:F [1.9.13_1145]
Rating: 0.0/5 (0 votes cast)

On Friday, April11 2008 Google Webmasters Blog revealed that Google had been testing a new search related technology that would enable Google crawl agents to explore some HTML forms in an attempt to discover new web pages and URLs which have not yet been found and indexed.

Google's crawling and indexing team explained, "In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a

element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page."

The Google crawling agent also known as 'Googlebot' while searching for these unknown sites, still adheres strictly to 'robots.txt', 'nofollow', and 'noindex' commands. In the very same fashion, Google does not retrieve forms that may require any sort of user information. Also, forms that have a password input or that use terms commonly associated with personal information such as logins, user ids, contacts, etc. are also avoided by Googlebot.

The crawling for these yet unknown pages does not affect those websites that are already a part of the crawling process, thus eliminating any chances of a fall in PageRank. These pages that are hidden deep in the online abyss are also referred to as Deep Web, Hidden Web or Invisible Web.

Recommend this story

Navneet Kaushal

About the author:

Navneet Kaushal, CEO PageTraffic is a trusted authority in the search engine marketing industry. He is a featured author at Web Pro News, Search Newz, Promotionworld, Website Notes, DevWebPro, SEO Article and Web Help Now among many others. Follow Navneet Kaushal on Google +.

Related Articles

  • No Related Post

Leave a Comment