A Guide To Robots.txt on Shopify

By
Daryl Rosser
6
min read

{{ebook-chapter}}

After years of waiting, we're finally able to edit the <code>Robots.txt</code> file on our Shopify stores (both standard and Shopify Plus).

Here's how to edit it, when you should customise it, and how this is useful for SEO.

{{potential-cta}}

What is Robots.txt?

<code>Robots.txt</code> is a file containing rules for robots/crawlers accessing your website. An example rule could be "disallow", where you set a specific directory or URL as disallowed so specific or all robots are asked not to access it.

This file is always located at:<br/><code>yourwebsite.com/robots.txt</code>

Having rules in your <code>Robots.txt</code> doesn't necessarily "force" bots to adhere to them, but most good bots including googlebot, ahrefsbot, bingbot, duckduckbot, etc will check this file before crawling.

How to edit Robots.txt on Shopify

  1. Open your Shopify Dashboard
  2. Go to Online Store > Themes
  3. In the Live theme section, click Actions > Edit code
  4. Under the templates section, click "Add a new template"
  5. Change "Create a new template for" to <code>Robots.txt</code>
  6. Click "Create template"

This will create a <code>Robots.txt.liquid</code> file with the following code:

This template file directly modifies the <code>Robots.txt</code> file, while this default code adds all the default rules Shopify use out of the box.

Note: I'd highly suggest not removing these rules, most are optimised well by Shopify

Now we've got the file, we can customise it however we see fit.

Customising Robots.txt.liquid

There are 3 customisations we may want to make to this file:

  • Add a new rule to an existing group
  • Remove a rule from an existing group
  • Add custom rules

Group refers to a set of rules for specific crawler(s).

Add a new rule to an existing group

Here is the file modified to include a few default rules we tend to use for clients:

What this code says is if the <code>user_agent</code> (Robots name) is equal to <code>*</code>, which applies to all robots, then disallow the following:

  • <code>/collections/all</code> - This will block the default collection containing a list of all products (NOTE: This will block any collections with handles like all-shirts, do not use if you have this)
  • <code>/collections/vendors*?*q=</code> - This will block the default vendors collections being crawled
  • <code>/collections/types*?*q=</code> - This will block the default types collections being crawled
  • <code>/collections/*?*constraint*</code> - This will block another parameter for vendors and types
  • <code>/collections/*/*</code> - This will block product tags being crawled (NOTE: be careful with this, it may prevent products being crawled also if you don't customise internal links)
  • <code>/collections/*?*filter*</code> - This will block common filter parameters being crawled
  • <code>/collections/*?*pf_*</code> - This will block a common filter parameter being crawled
  • <code>/collections/*?*view*</code> - This will block a parameter to change how many products are displayed
  • <code>/collections/*?*grid_list*</code> - This will block a parameter to change how the products are shown
  • <code>/collections/?page=*</code> - This will block the default collection pagination being crawled (NOTE: it’s important there’s no wildcard before the question mark, otherwise this would apply to every collection pagination)
  • <code>/blogs/*/tagged</code> - This will block blog tags being crawled

The <code>*</code> applies a wildcard to mean "anything here"e.g. <code>/collections/anything?constraint=anything</code> will be blocked.

Remove a default rule from an existing group

While not recommended, if needed default rules can be removed from the <code>Robots.txt</code> file.

Here's the default Shopify <code>Robots.txt</code> and rules for your reference:

Let's say we wanted to remove the rule blocking <code>/policies/</code>, here's an example code to do that:

All we're doing is saying if there's a "Disallow" rule with the value "/policies/" don't show this. Or more accurately, show all rules unless it's this one.

Add custom rules

If you'd like to apply rules that don't apply to a default group (*, adsbot-google, Nutch, AhrefsBot, AhrefsSiteAudit, MJ12bot, and Pinterest) then you can add them at the bottom of the template file.

For example, if you wanted to block the WayBackMachine you could add the following:

Or if you wanted to add an additional sitemap, you could add this:

{{potential-cta}}

How do I know if my Robots.txt file is working?

It can be difficult to tell if your Robots.txt file is properly configured and allowing search engines to crawl your site.

With our Robots.txt testing tool, you can easily test a specific URL against a user agent to see if it's crawlable or blocked by Robots.txt. If you're testing a Shopify store, we'll also recommend rules based on our recommendations above.

Why customise Robots.txt?

If you're not an SEO, you may be wondering why this even matters. Let me explain.

It comes down to both:

  1. Crawl Budget
  2. Thin Content

Crawl Budget

There is a technical SEO concept known as Crawl Budget, it's a term describing the amount of resources search engines allocate to crawling each website.

In short:

Search engines can't crawl every page of the entire web regularly (it's too many!). So they use algorithms to decide how much resources to allocate to each website.

If your website requires more resources than is allocated to it, then pages will be skipped from crawling regularly.

For SEO, you want search engines like Google to regularly crawl your website so they're tracking your improvements. If they're not crawling these pages, they have no idea how they've changed or improved, so you won't see any ranking improvements.

Where this matters is when low quality pages are being crawled and important ones are being left out.

So when SEOs discuss "crawl budget", they're specifically referring to how can we best utilise the crawl budget we have.

By utilising <code>Robots.txt</code>, we can specifically block bots from crawling certain pages or directories, which reduces the wasted crawl budget.

Before this the only solution we had was setting pages to <code>noindex</code>, which helps for thin content (next section), but still requires robots to crawl the pages.

Thin Content

Thin content is an SEO term referring to content that adds no value to search engine users.

If you go to <code>YOURSTORE.COM/collections/vendors?q=BRAND</code>, you'll see a default page created for any vendors you've set in your Shopify Dashboard.

This page has no content on it, no description, and can't be customised in any way. Not to mention the ugly URL.

We'd call this "thin content", it's unlikely to rank for anything in Google with all these downsides.

The best solution would be to remove or block this page, then manually create a new Shopify Collection to target this vendor/brand name, which can be fully customised.

Before we could edit <code>Robots.txt</code>, our only solution for this was to set these pages to <code>noindex, follow</code>. Essentially requesting search engines to follow links on this page, but don't add it this page to their search engine results.

This worked, but it still led to potentially hundreds of pages being crawled first.

Now we can disallow these from being crawled altogether, which both reduces the thin content and saves crawl budget.

Conclusion

Hopefully that last section didn't lose anyone, this can get quite technical.

Shopify finally trusting us to edit our own <code>Robots.txt</code> file is a huge upgrade for Shopify stores, however I would urge caution to non-SEOs and non-developers when doing so.

It's entirely possible to block your entire website and create serious issues with this functionality.

So by all means customise it, we make modifications for all of our clients, but be careful to do it right.

You can also test rules using Google's Robots tester tool inside of GSC.

I hope this guide was helpful, if you need any help with this, feel free to get in touch with us.

For more on Shopify, you can read our extensive Shopify SEO guide.

Looking for more Shopify SEO strategies?

Here's some internal links to other content I've written on this topic:

{{potential-cta}}

Previous chapter
Next chapter