Filed under:

How do I control Search Engines using the robots.txt file ?




How do I control Search Engines using the robots.txt file ?

The spiders used to retreive website data for all legitimate search engines follow certain defined rules in a file called 'robots.txt', which should be placed in the root directory of a web site.

This file contains instructions about what a spider can and cannot follow and index within the sites structure, and therefore which directories / pages / images etc. that can be retrieved and indexed.

Example:

User-agent: *
Disallow: /cgi-bin/
Disallow: /jscript/
Disallow: /beta/
Disallow: /images/
Disallow: bogus.htm
This robots.txt file explains to the spiders that ...
  • User-agent: *
    all search engines are welcome to collect data from the site
  • Disallow: /cgi-bin/
    certain directories (and all the files/pages within) are to be collected for indexing
  • Disallow: bogus.htm
    the page bogus.htm in the site root should similarly not be retrieved and included.


Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article

Also Read
article icon Basic On-Page SEO Rules (Views: 1902)

Language: