GPTBot
| Vendor | OpenAI |
| Type | AI training crawler |
| robots.txt token | GPTBot |
| JavaScript rendering | No — HTTP-only fetcher |
| Honors robots.txt | Yes |
| Vendor docs | platform.openai.com/docs/gptbot |
User-Agent strings
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot
(GPTBot/1.0 is also still in circulation.)
Purpose
Collects public web pages for training future OpenAI foundation models. Distinct from OAI-SearchBot (search index) and ChatGPT-User (live retrieval). Per OpenAI's documentation, content fetched by GPTBot may be used to improve model knowledge, while content fetched by the other two bots is used for distinct product features.
Network identity
- Hostname pattern:
*.openai.com - Authoritative IP list:
https://openai.com/gptbot.json (regularly updated)
In our audit
GPTBot was blocked at 62/100 e-commerce sites — the largest blocked cohort, almost always part of the {GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot} set blocked together by the Cloudflare "Block AI Bots" managed rule.
GPTBot does not render JavaScript. A site that ships an empty SPA shell — or that pre-renders only for the trusted-crawler allowlist — will be effectively invisible to GPTBot's training process even if not explicitly blocked.
How to allow / block
Opt-out of training:
User-agent: GPTBot
Disallow: /
Opt out of training but allow live retrieval (recommended for ecommerce that wants AI shopping visibility but not training-data use):
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Allow: /
Quirks
- Honors robots.txt strictly. If
Disallow: /is set forGPTBot,
OpenAI's training fleet does not fetch the site at all.
- Does not execute JavaScript; do not rely on JS to produce content
intended for GPTBot.
- The GPTBot/1.1 UA shipped in 2024 with a slightly different fetch
cadence than 1.0 (smaller batches, more polite to rate limits).