Magento Robots.txt File Examples

Magento robots txt files

Magento is a leading and one of the most popular eCommerce platforms used by businesses from all over the world (the United States, Australia, the United Kingdom, Germany, etc.). Of course, for our readers that is not news. It gained such popularity thanks to the variety of features available out of the box. As Magento is an open source platform, it can be powered by the extensions developed by your own or found on Magento Marketplace.

However, when it comes to Magento SEO and eCommerce, we see that so powerful solution doesn’t have a robots.txt file with a default installation. This robot is created to tell search engines which files they have to index on your site, and which ignore. And when it comes time to add the robot.txt file, developers try to find the answer on one more question - which things should Magento robots.txt file include? And in today's article, we will try to answer it and provide you with the real examples. 

Things That Your Robot.txt File Has to Include - Study Examples

And now, we offer you to read the basic information about robots.txt file.

Three Basic Facts About Robot.txt Files

  • A robots.txt file is a text file with a number of commands, which are read by search engines. This file can be created with a help of Notepad or another standard text editor.
  • A robots file has to be placed in the root of the website (as a rule, where the homepage locates). 
  • There is one common mistake - people disallow the entire site and do not let the search engines to index it. 

Magento Robots.txt Examples From The Top Magento Agencies

Now, we are going to show you several file examples taken from other Magento service providers. But please remember, it is a wrong way just to take these generic files and submit it as a robots.txt file on the specific Magento store. Why? It is easy to explain. 

Every Magento store has its own structure and to make the robot file fit especially your store, you have to optimize its content accordingly. Only after that, it will work properly. 

So now, look at the top companies robots.txt examples:

Magento robots.txt template offered by BlueAcorn:

 

User-agent: *

Disallow: /index.php/

Disallow: /*?

Disallow: /*.js$

Disallow: /*.css$

Disallow: /customer/

Disallow: /checkout/

Disallow: /js/

Disallow: /lib/

Disallow: /media/

Allow: /media/catalog/product/

Disallow: /*.php$

Disallow: /skin/

Disallow: /catalog/product/view/

 

User-agent: Googlebot-Image

Disallow: /

Allow: /media/catalog/product/

 

Sitemap: http://example.com/sitemap/sitemap.xml

Any robots.txt file has to contain a “User-agent” that identifies which Search Engines are allowed to index your website. In the example above we give a green light to all search engines. The ?p is allowed, but it is disallowed if another parameter is used together with the ?p. It permits the rel prev next implementation while disabling a number of combinations with other attributes. 

Inchoo’s Magento robots.txt file:

 

# Google Image Crawler Setup

User-agent: Googlebot-Image

Disallow:

# Crawlers Setup

User-agent: *

 

# Directories

Disallow: /404/

Disallow: /app/

Disallow: /cgi-bin/

Disallow: /downloader/

Disallow: /errors/

Disallow: /includes/

#Disallow: /js/

#Disallow: /lib/

Disallow: /magento/

#Disallow: /media/

Disallow: /pkginfo/

Disallow: /report/

Disallow: /scripts/

Disallow: /shell/

Disallow: /skin/

Disallow: /stats/

Disallow: /var/

 

# Paths (clean URLs)

Disallow: /index.php/

Disallow: /catalog/product_compare/

Disallow: /catalog/category/view/

Disallow: /catalog/product/view/

Disallow: /catalogsearch/

#Disallow: /checkout/

Disallow: /control/

Disallow: /contacts/

Disallow: /customer/

Disallow: /customize/

Disallow: /newsletter/

Disallow: /poll/

Disallow: /review/

Disallow: /sendfriend/

Disallow: /tag/

Disallow: /wishlist/

Disallow: /catalog/product/gallery/

# Files

Disallow: /cron.php

Disallow: /cron.sh

Disallow: /error_log

Disallow: /install.php

Disallow: /LICENSE.html

Disallow: /LICENSE.txt

Disallow: /LICENSE_AFL.txt

Disallow: /STATUS.txt

 

# Paths (no clean URLs)

#Disallow: /*.js$

#Disallow: /*.css$

Disallow: /*.php$

Disallow: /*?SID=

 

As it is illustrated in the example above, Inchoo allows to index image for image search and disallows some blank image pages. It excludes some undesirable folders in index for a common Magento online store settings. However, we would like to draw your attention to the fact that most of the sorting and pagination parameters are not disallowed as we suppose you will solve them by using rel prev and adding meta “noindex, follow” to the rest of the sorting parameters. 

The next is an example from Astrio‘s portfolio:

 

User-agent: *

Disallow: /*?

Disallow: /app/

Disallow: /catalog/

Disallow: /catalogsearch/

Disallow: /checkout/

Disallow: /customer/

Disallow: /downloader/

Disallow: /js/

Disallow: /lib/

Disallow: /pkginfo/

Disallow: /report/

Disallow: /skin/

Disallow: /tag/

Disallow: /review/

Disallow: /var/

 

Below, we present Robots.txt file from Groove Commerce‘s portfolio:

 

# Groove Commerce Magento Robots.txt 05/2011

#

# robots.txt

#

# This file is to prevent the crawling and indexing of certain parts

# of your site by web crawlers and spiders run by sites like Yahoo!

# and Google. By telling these “robots” where not to go on your site,

# you save bandwidth and server resources.

#

# This file will be ignored unless it is at the root of your host:

# Used: http://example.com/robots.txt

# Ignored: http://example.com/site/robots.txt

#

# For more information about the robots.txt standard, see:

# http://www.robotstxt.org/wc/robots.html

#

# For syntax checking, see:

# http://www.sxw.org.uk/computing/robots/check.html

 

# Website Sitemap

Sitemap: http://www.eckraus.com/sitemap.xml

 

# Crawlers Setup

 

# Directories

User-agent: *

Disallow: /404/

Disallow: /app/

Disallow: /cgi-bin/

Disallow: /downloader/

Disallow: /includes/

Disallow: /js/

Disallow: /lib/

Disallow: /magento/

Disallow: /pkginfo/

Disallow: /report/

Disallow: /skin/

Disallow: /stats/

Disallow: /var/

Disallow: /blog/

 

# Paths (clean URLs)

User-agent: *

Disallow: /index.php/

Disallow: /catalog/product_compare/

Disallow: /catalog/category/view/

Disallow: /catalog/product/view/

Disallow: /catalogsearch/

Disallow: /checkout/

Disallow: /control/

Disallow: /contacts/

Disallow: /customer/

Disallow: /customize/

Disallow: /newsletter/

Disallow: /poll/

Disallow: /review/

Disallow: /sendfriend/

Disallow: /tag/

Disallow: /wishlist/

 

# Files

User-agent: *

Disallow: /cron.php

Disallow: /cron.sh

Disallow: /error_log

Disallow: /install.php

Disallow: /LICENSE.html

Disallow: /LICENSE.txt

Disallow: /LICENSE_AFL.txt

Disallow: /STATUS.txt

 

# Paths (no clean URLs)

User-agent: *

Disallow: /*.js$

Disallow: /*.css$

Disallow: /*.php$

Disallow: /*?p=*&

Disallow: /*?SID=

 

Having analysed a number of Robots.txt file examples, we can come to conclusion that most of the top Magento service providers use almost the same approach of robots.txt applying. But at the same time, they do not forget to optimise this file  in accordance with the specific online store features. Also, we see that all examples include references to XML sitemaps. In this way search engines “understand” where to find the necessary sitemap. You just have to add your website address plus wording - sitemap.xml. And as the last point, we would like to indicate two main opportunities which Robots.txt gives us: 

  • We can prevent duplicate content issue (it is very good for SEO);
  • And we are able to hide technical details about the site (for example error logs, SVN files, wanted directories etc. 
  • As a result, we have got a clean URLs which will be indexed in search engines.

You may also be interested in these articles:

Comments (1):
Posted on Wednesday, October 26, 2016 by :
This post is actually useful to any URL indexing in Google search console. Robots.txt file must be necessary of any blog. It will be submitted at any of the links will not be submitted at the link, robots.txt file gives the suggestion for Google indexing.
Leave a comment:
*Your comment will be published after approval by site administrator.