A post on Google blog gives important details to the web publishers about how they can control indexing and accessing of sites by search engines and Google itself. The most important tool in this regard is the robots.txt file. Robots.txt file gives powerful control to site owners on how the site is searched. The post reads “you may have a few pages on your site you don't want in Google's index. For example, you might have a directory that contains internal logs, or you may have news articles that require payment to access. You can exclude pages from Google's crawler by creating a text file called robots.txt and placing it in the root directory. The robots.txt file contains a list of the pages that search engines shouldn't access. Creating a robots.txt is straightforward and it allows you a sophisticated level of control over how search engines can access your web site.”
Besides the robots.txt file there is robots META tag by which you can gain more fine control over the individual pages. This requires specific META tags to HTML pages giving you the control over the way individual page is indexed.