Robots.txt is a text file that website owners use to interact with search engine robots when indexing their site. Robots.txt is located in the root directory of the site and contains instructions for search engine robots. By studying the file, the robot learns which pages of the site should be scanned and which should not be indexed. Robots.txt allows site owners and SEO specialists to manage indexing and control search engine robot access to confidential or outdated pages. Additionally, the file can help indicate duplicate pages to prevent their indexing and avoid search engine penalties.
The robots.txt file contains the following instructions for search engine robots:
- user-agent — specifies which search engine system the instructions apply to. For example, User-agent: Googlebot.
- disallow — instructs the search engine robot which pages to exclude from indexing. For example, Disallow: /private/ means that the private folder should be excluded from indexing.
- allow — instructs the search engine robot which pages can be indexed. For example, Allow: /public/ means that the public folder can be indexed.
- sitemap — indicates the location of the sitemap on the site. For example, Sitemap: https://www.example.com/sitemap.xml.
In addition, comments can be added to the robots.txt file, starting with the # symbol. These comments will not be used by search engine robots and are intended for humans. Robots.txt is an important element of website search engine optimization, so SEO specialists pay special attention to it.