Google Tend To Crawl & Index HTML Forms!

Apr 14, 2008 | 3,858 views | by Navneet Kaushal
VN:F [1.9.20_1166]
Rating: 0.0/5 (0 votes cast)

On Friday, April11 2008 Google Webmasters Blog revealed that Google had been testing a new search related technology that would enable Google crawl agents to explore some HTML forms in an attempt to discover new web pages and URLs which have not yet been found and indexed.

Google's crawling and indexing team explained, "In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a

element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page."

The Google crawling agent also known as 'Googlebot' while searching for these unknown sites, still adheres strictly to 'robots.txt', 'nofollow', and 'noindex' commands. In the very same fashion, Google does not retrieve forms that may require any sort of user information. Also, forms that have a password input or that use terms commonly associated with personal information such as logins, user ids, contacts, etc. are also avoided by Googlebot.

The crawling for these yet unknown pages does not affect those websites that are already a part of the crawling process, thus eliminating any chances of a fall in PageRank. These pages that are hidden deep in the online abyss are also referred to as Deep Web, Hidden Web or Invisible Web.

4.thumbnail Google Tend To Crawl & Index HTML Forms!

Navneet Kaushal

Navneet Kaushal is the founder and CEO of PageTraffic, an SEO Agency in India with offices in Chicago, Mumbai and London. A leading search strategist, Navneet helps clients maintain an edge in search engines and the online media. Navneet's expertise has established PageTraffic as one of the most awarded and successful search marketing agencies.
4.thumbnail Google Tend To Crawl & Index HTML Forms!
4.thumbnail Google Tend To Crawl & Index HTML Forms!