How do I control Search Engines using the robots.txt file ? SEO: Search Engine Optimisation Knowledgebase @ Astutium Ltd

Filed under:

How do I control Search Engines using the robots.txt file ?

How do I control Search Engines using the robots.txt file ?

The spiders used to retreive website data for all legitimate search engines follow certain defined rules in a file called 'robots.txt', which should be placed in the root directory of a web site.

This file contains instructions about what a spider can and cannot follow and index within the sites structure, and therefore which directories / pages / images etc. that can be retrieved and indexed.
Example:
User-agent: *
Disallow: /cgi-bin/
Disallow: /jscript/
Disallow: /beta/
Disallow: /images/
Disallow: bogus.htm 
This robots.txt file explains to the spiders that ...

User-agent: *
all search engines are welcome to collect data from the site

Disallow: /cgi-bin/
certain directories (and all the files/pages within) are to be collected for indexing

Disallow: bogus.htm
the page bogus.htm in the site root should similarly not be retrieved and included.

Add to Favourites Print this Article

Knowledgebase

Astutium > Knowledgebase > SEO: Search Engine Optimisation > How do I control Search Engines using the robots.txt file ?

How do I control Search Engines using the robots.txt file ?

Astutium

Products / Services

Support

Stay In Touch