GPTBot

Vendor: OpenAI
Type: AI training
JavaScript rendering: no
Honors robots.txt: yes


Vendor	OpenAI
Type	AI training crawler
robots.txt token	`GPTBot`
JavaScript rendering	No — HTTP-only fetcher
Honors robots.txt	Yes
Vendor docs	platform.openai.com/docs/gptbot

User-Agent strings

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

(GPTBot/1.0 is also still in circulation.)

Purpose

Collects public web pages for training future OpenAI foundation models. Distinct from OAI-SearchBot (search index) and ChatGPT-User (live retrieval). Per OpenAI's documentation, content fetched by GPTBot may be used to improve model knowledge, while content fetched by the other two bots is used for distinct product features.

Network identity

Hostname pattern: *.openai.com
Authoritative IP list:

https://openai.com/gptbot.json (regularly updated)

In our audit

GPTBot was blocked at 62/100 e-commerce sites — the largest blocked cohort, almost always part of the {GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot} set blocked together by the Cloudflare "Block AI Bots" managed rule.

GPTBot does not render JavaScript. A site that ships an empty SPA shell — or that pre-renders only for the trusted-crawler allowlist — will be effectively invisible to GPTBot's training process even if not explicitly blocked.

How to allow / block

Opt-out of training:

User-agent: GPTBot
Disallow: /

Opt out of training but allow live retrieval (recommended for ecommerce that wants AI shopping visibility but not training-data use):

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Allow: /

Quirks

Honors robots.txt strictly. If Disallow: / is set for GPTBot,

OpenAI's training fleet does not fetch the site at all.

Does not execute JavaScript; do not rely on JS to produce content

intended for GPTBot.

The GPTBot/1.1 UA shipped in 2024 with a slightly different fetch

cadence than 1.0 (smaller batches, more polite to rate limits).

Bot reference catalog