Bot reference catalog

One-page summaries of 30 common search, AI/LLM, and social-preview crawlers.

← bot index · audit index

GPTBot

Vendor
OpenAI
Type
AI training
JavaScript rendering
no
Honors robots.txt
yes
VendorOpenAI
TypeAI training crawler
robots.txt tokenGPTBot
JavaScript renderingNo — HTTP-only fetcher
Honors robots.txtYes
Vendor docsplatform.openai.com/docs/gptbot

User-Agent strings

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; GPTBot/1.1; +https://openai.com/gptbot

(GPTBot/1.0 is also still in circulation.)

Purpose

Collects public web pages for training future OpenAI foundation models. Distinct from OAI-SearchBot (search index) and ChatGPT-User (live retrieval). Per OpenAI's documentation, content fetched by GPTBot may be used to improve model knowledge, while content fetched by the other two bots is used for distinct product features.

Network identity

https://openai.com/gptbot.json (regularly updated)

In our audit

GPTBot was blocked at 62/100 e-commerce sites — the largest blocked cohort, almost always part of the {GPTBot, ChatGPT-User, ClaudeBot, PerplexityBot} set blocked together by the Cloudflare "Block AI Bots" managed rule.

GPTBot does not render JavaScript. A site that ships an empty SPA shell — or that pre-renders only for the trusted-crawler allowlist — will be effectively invisible to GPTBot's training process even if not explicitly blocked.

How to allow / block

Opt-out of training:

User-agent: GPTBot
Disallow: /

Opt out of training but allow live retrieval (recommended for ecommerce that wants AI shopping visibility but not training-data use):

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Allow: /

Quirks

OpenAI's training fleet does not fetch the site at all.

intended for GPTBot.

cadence than 1.0 (smaller batches, more polite to rate limits).

Sources