Robots.txt Generator
Generate a robots.txt file for your website. Control search engine crawling with an easy visual builder.
Configure your crawling rules below and copy or download the generated robots.txt file.
Crawling Rules
Generated robots.txt
User-agent: * Allow: / Disallow: /admin Disallow: /api Disallow: /login Sitemap: https://example.com/sitemap.xml
What Is robots.txt?
The robots.txt file is a plain text file at your website's root that tells search engine crawlers which pages they're allowed to visit and which they should skip. It's the first file a well-behaved crawler checks before accessing any page on your site.
Think of it as a "staff only" sign for web crawlers. It's a polite request, not a security measure — crawlerschoose to respect it. Googlebot and Bingbot follow the rules reliably. Malicious bots and scrapers typically ignore it entirely. Never use robots.txt as your only defence for sensitive content.
Directive Reference
| Directive | Syntax | What It Does |
|---|---|---|
| User-agent | User-agent: * | Specifies which crawler the rules apply to (* = all) |
| Allow | Allow: /public/ | Permits crawling of a specific path |
| Disallow | Disallow: /admin/ | Blocks crawling of a specific path |
| Sitemap | Sitemap: https://…/sitemap.xml | Points crawlers to your XML sitemap |
| Crawl-delay | Crawl-delay: 10 | Requests N seconds between requests (not respected by Google) |
What this means for you: Google ignores Crawl-delay — use Search Console's crawl rate settings instead. Bing and Yandex do respect it. The Sitemap directive is the most important after Allow/Disallow.
Common robots.txt Patterns
| Scenario | robots.txt | Notes |
|---|---|---|
| Allow everything | User-agent: * Allow: / | Default for most sites — let crawlers see everything |
| Block everything | User-agent: * Disallow: / | Staging/dev sites only — never do this in production |
| Block admin paths | Disallow: /admin/ Disallow: /api/ | Standard security hygiene — don't expose backend routes |
| Block a specific bot | User-agent: AhrefsBot Disallow: / | Blocks aggressive SEO crawlers that waste bandwidth |
Common Mistakes
Blocking CSS and JS Files
Google needs to render your pages to understand them. Blocking CSS/JS files in robots.txt prevents rendering and can hurt your rankings. Only block truly private resources.
Using robots.txt for Security
Disallow doesn't hide pages — it just asks crawlers not to visit them. The URLs are still visible in the file. Use authentication and proper access controls for sensitive content.
Forgetting to Update After Redesign
Site redesigns often change URL structures. If your old robots.txt blocks paths that are now important, those pages won't get crawled. Review robots.txt after every major change.
Confusing Disallow with Noindex
Disallow prevents crawling. Noindex prevents indexing. A page blocked by robots.txt can still appear in search results if other sites link to it. Use noindex meta tags to prevent indexing.
AI Crawlers You Should Know About
| Bot Name | Company | User-Agent | What It Does |
|---|---|---|---|
| GPTBot | OpenAI | GPTBot | Crawls content for training ChatGPT models |
| Google-Extended | Google-Extended | Training data for Gemini/Bard AI models | |
| CCBot | Common Crawl | CCBot | Open dataset used by many AI companies |
| anthropic-ai | Anthropic | anthropic-ai | Crawls for Claude model training |
| ClaudeBot | Anthropic | ClaudeBot | Web browsing for Claude responses |
To block all AI training crawlers, add User-agent: GPTBot and Disallow: / blocks for each bot. Blocking search engine crawlers (Googlebot, Bingbot) is a separate decision — those affect your search rankings, not AI training. You can block AI training while keeping your search presence.
Related Tools
Sitemap Generator
Generate the sitemap.xml referenced in your robots.txt.
Meta Tag Generator
Add noindex tags to pages that need more than robots.txt blocking.
Schema Markup Generator
Add structured data to pages that crawlers are allowed to access.
Google SERP Preview
Preview how your crawlable pages appear in search results.
User Agent Parser
Identify the bot user agents you want to control in robots.txt.
HTTP Status Code Lookup
Understand status codes in your crawl logs and error reports.
How to use this tool
Enter your sitemap URL and toggle common blocking rules (admin, API, login)
Add custom disallow paths for any additional routes to block
Copy or download the generated robots.txt file and upload to your site root
Common uses
- Blocking admin and login pages from search engine crawling
- Preventing API endpoints from appearing in search results
- Setting up robots.txt for new website deployments
- Adding sitemap references for improved search engine discovery
Share this tool