Robots.txt Testing Tool

Quickly check your pages' crawlability status

Validate your Robots.txt by checking if your URLs are properly allowed or blocked. Running a Shopify store? We'll also suggest some rules to add.

What is Robots.txt?

Robots.txt is a text file that specifies the rules web robots or spiders should follow when crawling your site. It tells them which pages or directories they are allowed or not allowed to crawl.

How do I use Robots.txt Testing Tool?

You can either type in the URL of the site you want to test or paste in your own Robots.txt file. After that, input the URL you'd like to test and the tool will report if the it's crawlable or blocked by Robots.txt.

What is a user agent?

A user agent is a software agent that acts on behalf of a user. In the case of Robots.txt Testing Tool, it specifies which web crawler you'd like to test against.

Why is it important to test Robots.txt?

Testing Robots.txt is important to ensure that web crawlers can access the content you want to be indexed and avoid blocking them from accessing important pages or directories.

Likewise, you want to make sure that low quality pages with no ranking purposes are blocked in order to save crawl budget.

The tool says my URL is blocked but I don't want it to be. What should I do?

Under normal circumstances, your provided Robots.txt shouldn't block any important pages. However if it does, you should review the rules and modify them as needed to ensure that the desired content is crawlable by search engines.

What are these suggestions I'm seeing?

If you're testing a Shopify store, the Robots.txt Testing Tool will provide suggestions for additional rules to add to your robots.txt file. These suggestions are based on best practices for optimizing your site for search engines, and can help improve your site's visibility in search results.

Curious what each of these rules do? Check out our write up on Shopify Robots.txt SEO.

How do I read the Robots.txt file?

  • The user-agent directive specifies the web crawler to which the rule applies, some popular user agents are Googlebot, Googlebot Smartphone, Ahrefs, DuckDuckGo.
  • "User-agent: *" signifies that all crawlers must follow these rules.
  • The allow/disallow directives specify which pages or directories the crawler is allowed or not allowed to access.
  • The wildcard character (*) can be used to match any string of characters in a URL.
  • The $ wildcard matches any URL path that ends in the designated string.
  • Crawl-delay specifies the time (in seconds) that search engines should delay before crawling or re-crawling the site.
  • The sitemap directive provides the user agent with the location of your site's sitemap.xml file.

Is it possible to edit my Robots.txt file?

Yes, it's usually possible to edit your Robots.txt file. However, it's important to understand the potential implications of making changes to the file. Consult your web host or content management system's documentation & support first.

For Shopify stores, we have a guide on how to edit the Robots.txt file here.