A Guide To Robots.txt on Shopify

After years of waiting, we’re finally able to edit the Robots.txt file on our Shopify stores (both standard and Shopify Plus).

Here’s how to edit it, when you should customise it, and how this is useful for SEO.

Is your Shopify SEO up to scratch? Book a custom video review.

What is Robots.txt?

Robots.txt is a file containing rules for robots/crawlers accessing your website. An example rule could be “disallow”, where you set a specific directory or URL as disallowed so specific or all robots are asked not to access it.

This file is always located at:
yourwebsite.com/robots.txt

Having rules in your Robots.txt doesn’t necessarily “force” bots to adhere to them, but most good bots including googlebot, ahrefsbot, bingbot, duckduckbot, etc will check this file before crawling.

How to edit Robots.txt on Shopify

  1. Open your Shopify Dashboard
  2. Go to Online Store > Themes
  3. In the Live theme section, click Actions > Edit code
  4. Under the templates section, click “Add a new template”
  5. Change “create a new template for” to “robots.txt”
  6. Click “Create template”

This will create a Robots.txt.liquid file with the following code:

# we use Shopify as our ecommerce platform
{%- comment -%}
# Caution! Please read https://help.shopify.com/en/manual/promoting-marketing/seo/editing-robots-txt
{% endcomment %}
{% for group in robots.default_groups %}
  {{- group.user_agent -}}

  {% for rule in group.rules %}
    {{- rule -}}
  {% endfor %}

  {%- if group.sitemap != blank -%}
    {{ group.sitemap }}
  {%- endif -%}
{% endfor %}

This template file directly modifies the Robots.txt file, while this default code adds all the default rules Shopify use out of the box.

Note: I’d highly suggest not removing these rules, most are optimised well by Shopify

Now we’ve got the file, we can customise it however we see fit.

Customising Robots.txt.liquid

There are 3 customisations we may want to make to this file:

  • Add a new rule to an existing group
  • Remove a rule from an existing group
  • Add custom rules

Group refers to a set of rules for specific crawler(s).

Add a new rule to an existing group

Here is the file modified to include a few default rules we tend to use for clients:

{% for group in robots.default_groups %}
  {{- group.user_agent }}

  {%- for rule in group.rules -%}
    {{ rule }}
  {%- endfor -%}

  {%- if group.user_agent.value == '*' -%}
    {{ 'Disallow: /collections/all*' }}
    {{ 'Disallow: /*?q=*' }}
    {{ 'Disallow: /collections/*/*' }}
    {{ 'Disallow: /blogs/*/tagged/*' }}
  {%- endif -%}

  {%- if group.sitemap != blank -%}
      {{ group.sitemap }}
  {%- endif -%}
{% endfor %}

What this code says is if the user_agent (Robots name) is equal to *, which applies to all robots, then disallow the following:

  • /collections/all – This will block the default collection containing a list of all products incl. the pagination for this
  • /*?q= – This will block the default vendors and types collection pages being crawled
  • /collections/*/* – This will block product tags being crawled (be careful with this, it may prevent products being crawled also if you don’t customise internal links)
  • /blogs/*/tagged – This will block blog tags being crawled

The * applies a wildcard to mean “anything here” i.e. /anything?q=anything will be blocked.

Remove a default rule from an existing group

While not recommended, if needed default rules can be removed from the Robots.txt file.

Here’s the default Shopify Robots.txt and rules for your reference:

# we use Shopify as our ecommerce platform

User-agent: *
Disallow: /a/downloads/-/*
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /8203042875/checkouts
Disallow: /8203042875/orders
Disallow: /carts
Disallow: /account
Disallow: /collections/*sort_by*
Disallow: /*/collections/*sort_by*
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*/collections/*+*
Disallow: /*/collections/*%2B*
Disallow: /*/collections/*%2b*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
Disallow: /policies/
Disallow: /*/*?*ls=*&ls=*
Disallow: /*/*?*ls%3D*%3Fls%3D*
Disallow: /*/*?*ls%3d*%3fls%3d*
Disallow: /search
Disallow: /apple-app-site-association
Sitemap: YOURWEBSITE.COM/sitemap.xml

# Google adsbot ignores robots.txt unless specifically named!
User-agent: adsbot-google
Disallow: /checkout
Disallow: /carts
Disallow: /orders
Disallow: /8203042875/checkouts
Disallow: /8203042875/orders
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*

User-agent: Nutch
Disallow: /

User-agent: AhrefsBot
Crawl-delay: 10
Disallow: /a/downloads/-/*
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /8203042875/checkouts
Disallow: /8203042875/orders
Disallow: /carts
Disallow: /account
Disallow: /collections/*sort_by*
Disallow: /*/collections/*sort_by*
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*/collections/*+*
Disallow: /*/collections/*%2B*
Disallow: /*/collections/*%2b*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
Disallow: /policies/
Disallow: /*/*?*ls=*&ls=*
Disallow: /*/*?*ls%3D*%3Fls%3D*
Disallow: /*/*?*ls%3d*%3fls%3d*
Disallow: /search
Disallow: /apple-app-site-association
Sitemap: YOURWEBSITE.COM/sitemap.xml

User-agent: AhrefsSiteAudit
Crawl-delay: 10
Disallow: /a/downloads/-/*
Disallow: /admin
Disallow: /cart
Disallow: /orders
Disallow: /checkout
Disallow: /8203042875/checkouts
Disallow: /8203042875/orders
Disallow: /carts
Disallow: /account
Disallow: /collections/*sort_by*
Disallow: /*/collections/*sort_by*
Disallow: /collections/*+*
Disallow: /collections/*%2B*
Disallow: /collections/*%2b*
Disallow: /*/collections/*+*
Disallow: /*/collections/*%2B*
Disallow: /*/collections/*%2b*
Disallow: /blogs/*+*
Disallow: /blogs/*%2B*
Disallow: /blogs/*%2b*
Disallow: /*/blogs/*+*
Disallow: /*/blogs/*%2B*
Disallow: /*/blogs/*%2b*
Disallow: /*?*oseid=*
Disallow: /*preview_theme_id*
Disallow: /*preview_script_id*
Disallow: /policies/
Disallow: /*/*?*ls=*&ls=*
Disallow: /*/*?*ls%3D*%3Fls%3D*
Disallow: /*/*?*ls%3d*%3fls%3d*
Disallow: /search
Disallow: /apple-app-site-association
Sitemap: YOURWEBSITE.COM/sitemap.xml

User-agent: MJ12bot
Crawl-Delay: 10

User-agent: Pinterest
Crawl-delay: 1

Let’s say we wanted to remove the rule blocking /policies/, here’s an example code to do that:

{% for group in robots.default_groups %}
  {{- group.user_agent }}

  {%- for rule in group.rules -%}
    {%- unless rule.directive == 'Disallow' and rule.value == '/policies/' -%}
      {{ rule }}
    {%- endunless -%}
  {%- endfor -%}

  {%- if group.sitemap != blank -%}
      {{ group.sitemap }}
  {%- endif -%}
{% endfor %}

All we’re doing is saying if there’s a “Disallow” rule with the value “/policies/” don’t show this. Or more accurately, show all rules unless it’s this one.

Add custom rules

If you’d like to apply rules that don’t apply to a default group (*, adsbot-google, Nutch, AhrefsBot, AhrefsSiteAudit, MJ12bot, and Pinterest) then you can add them at the bottom of the template file.

For example, if you wanted to block the WayBackMachine you could add the following:

User-agent: ia_archiver
Disallow: /

Or if you wanted to add an additional sitemap, you could add this:

Sitemap: [sitemap-url]
Is your Shopify SEO up to scratch? Book a custom video review.

Why customise Robots.txt?

If you’re not an SEO, you may be wondering why this even matters. Let me explain.

It comes down to both:

  1. Crawl Budget
  2. Thin Content

Crawl Budget

There is a technical SEO concept known as Crawl Budget, it’s a term describing the amount of resources search engines allocate to crawling each website.

In short:

Search engines can’t crawl every page of the entire web regularly (it’s too many!). So they use algorithms to decide how much resources to allocate to each website.

If your website requires more resources than is allocated to it, then pages will be skipped from crawling regularly.

For SEO, you want search engines like Google to regularly crawl your website so they’re tracking your improvements. If they’re not crawling these pages, they have no idea how they’ve changed or improved, so you won’t see any ranking improvements.

Where this matters is when low quality pages are being crawled and important ones are being left out.

So when SEOs discuss “crawl budget”, they’re specifically referring to how can we best utilise the crawl budget we have.

By utilising Robots.txt, we can specifically block bots from crawling certain pages or directories, which reduces the wasted crawl budget.

Before this the only solution we had was setting pages to “noindex”, which helps for thin content (next section), but still requires robots to crawl the pages.

Thin Content

Thin content is an SEO term referring to content that adds no value to search engine users.

If you go to YOURSTORE.COM/collections/vendors?q=BRAND, you’ll see a default page created for any vendors you’ve set in your Shopify Dashboard.

This page has no content on it, no description, and can’t be customised in any way. Not to mention the ugly URL.

We’d call this “thin content”, it’s unlikely to rank for anything in Google with all these downsides.

The best solution would be to remove or block this page, then manually create a new Shopify Collection to target this vendor/brand name, which can be fully customised.

Before we could edit Robots.txt, our only solution for this was to set these pages to “noindex, follow”. Essentially requesting search engines to follow links on this page, but don’t add it this page to their search engine results.

This worked, but it still led to potentially hundreds of pages being crawled first.

Now we can disallow these from being crawled altogether, which both reduces the thin content and saves crawl budget.

Conclusion

Hopefully that last section didn’t lose anyone, this can get quite technical.

Shopify finally trusting us to edit our own Robots.txt file is a huge upgrade for Shopify stores, however I would urge caution to non-SEOs and non-developers when doing so.

It’s entirely possible to block your entire website and create serious issues with this functionality.

So by all means customise it, we make modifications for all of our clients, but be careful to do it right.

You can also test rules using Google’s Robots tester tool inside of GSC.

I hope this guide was helpful, if you need any help with this, feel free to get in touch with us.

For more on Shopify, you can read our extensive Shopify SEO guide.

Is your Shopify SEO up to scratch? Book a custom video review.