Are You Using Your robots.txt File Effectively For Your Ecommerce Store?

robotOne of the tools at any website’s disposal is the use of a robots.txt file. This is especially true for an ecommerce website, as this little text file can help guide search engines to the correct content to index. We’ll look at what a robots.txt file is, why it’s important, and how to use it to your advantage for your ecommerce website.

What is a robots.txt file?

A robots.txt file is a simple plain text file that is placed in the “root” folder of your website.  Probably the best resource for all things related to the robots.txt file is The Web Robots Pages website. Every major search engine / bot (i.e. Google, Bing, Yahoo, Ask, etc…) will look for the presence of this file before beginning to spider/crawl your website. The file will tell these search engines what they should not attempt to index.

For example, if you did not want Google and company indexing content in your “clients” folder, you would put this entry in the robots.txt file:

User-agent: *
Disallow: /clients/

This would tell any spider to not crawl/catalog material in the “clients” folder on your website. As long as the search engines respect your wishes via the robots.txt file (and almost all major search engines do), this can be used as an effective tool to help guide these search services towards the content you want to be found.

Why a robots.txt file is important

There are a few reasons why this file is important for any ecommerce website:

  • 1. Limit bandwidth consumption
    We’ve seen the new Russian Yandex search engine rack up 300+ GB in bandwidth for ShopSite stores that do not exclude their cart from being indexed. This spider built up a cart that had 100’s of items in their “cart” and every cart load was 50-100 MB in size. You can see how this could add up quickly!
  • 2. Irrelevant Content
    Sometimes you have material on your site that you’d rather not appear in the search engines. Items such as log files, temporary text files, client downloads, etc… This can take away from your overall SEO goals and focused search optimization on your other public pages.
  • 3. Bad Content
    Maybe you have some old pages you are keeping for various reasons. If these pages have outdated info or products, it can lead to people becoming confused when reaching your site.
  • 4. Bad Links
    If you have some link pages for your users to utilize, but you’d rather not have Google index these pages and the outbound links, a robots.txt file is one simple solution.

How to use it for an ecommerce store

There are at least two entries that should be in your robots.txt file if you run an ecommerce website:

  • 1. The cart URL
    You generally do not want search engines putting items in the cart and indexing the cart. Problems can include old products / prices being indexed, search links that do not work or give unpredictable results, and larger than normal bandwidth consumption when spiders build up large carts.
  • 2. The search URL
    If you have a site search feature, you wouldn’t want Google indexing a “no results” page or randomly adding things to the search terms. Exclude this using a robots.txt file

For example, for our ShopSite ecommerce clients, we automatically add an entry for all new stores in the robots.txt file such as:

User-agent: *
Disallow: /cgi-XYZ/sb/

This tells spiders not to index any of the cart or search scripts for ShopSite.

Robots.txt is not actual security

Some people think a robots.txt file is a security feature. It’s actually quite the opposite. The files does not physically prevent anyone from accessing URLs / files in your site. It is merely a list of files and directories you would not like search engines to spider. It is up to each spider whether they want to abide by your suggestions in this file.

It’s like the note you put on your porch for Halloween next to the candy bowl that says “Please take only one candy per person” when you’re not home for the night. It’s the honor system at work.

And just like that candy bowl (that tells people you’re not home), a robots.txt file can also tell people what you don’t want people looking at. There are a few fringe spiders that try and find content that others do not want indexed. If it’s sensitive information, put it behind a password protected directory.

Any other uses for a robots.txt file?

Looking for a web host that understands ecommerce and business hosting?
Check us out today!

4 Comments

  1. Jeff Morgan says:

    Hi Rob,

    I am sure this will help a lot of e-commerce sites to be more optimized for search engines. I was not familiar with this file earlier, but your post gave a pretty good idea of how this works. Very informative, thanks for sharing!

  2. Albert says:

    This is exactly what I was looking for. I need to setup robot.txt for my website and I found this post. You definitely saved my day. I will try this out and see how it goes. Thanks for sharing. :)

Leave a Reply to Albert