Bot reference catalog

Thirty common search, AI/LLM, and social-preview crawlers — official User-Agent strings, IP verification protocols, JavaScript rendering capability, and robots.txt control tokens.

Companion to the May 2026 cross-bot audit.

← audit index

Click any bot for a one-page reference covering UA strings, network identity, verification, robots.txt directives, and quirks.

Search

Applebot

Apple · renders JS: yes · robots: yes

Baiduspider

Baidu · renders JS: minimal · robots: yes

Bingbot

Microsoft · renders JS: yes · robots: yes

Brave Search Crawler

Brave Software · renders JS: yes · robots: yes

DuckDuckBot

DuckDuckGo · renders JS: no · robots: yes

Googlebot

Google · renders JS: yes · robots: yes

Naver Yeti

Naver · renders JS: partial · robots: yes

PetalBot

Huawei · renders JS: yes · robots: yes

Sogou Web Spider

Sogou (Tencent) · renders JS: minimal · robots: yes

Yahoo Slurp

Yahoo / Verizon Media · renders JS: partial · robots: yes

YandexBot

Yandex · renders JS: partial · robots: yes

AI training

Applebot-Extended

Apple · renders JS: no · robots: yes

Bytespider

ByteDance · renders JS: no · robots: partial

CCBot

Common Crawl · renders JS: no · robots: yes

ClaudeBot

Anthropic · renders JS: no · robots: yes

GPTBot

OpenAI · renders JS: no · robots: yes

Google-Extended

Google · renders JS: no · robots: yes

Meta-ExternalAgent

Meta · renders JS: no · robots: yes

AI search-index

Claude-SearchBot

Anthropic · renders JS: no · robots: yes

OAI-SearchBot

OpenAI · renders JS: no · robots: yes

PerplexityBot

Perplexity AI · renders JS: no · robots: partial

AI live-fetch

Amazonbot

Amazon · renders JS: yes · robots: yes

ChatGPT-User

OpenAI · renders JS: sometimes · robots: yes

Claude-User

Anthropic · renders JS: yes · robots: yes

Meta-ExternalFetcher

Meta · renders JS: yes · robots: yes

Perplexity-User

Perplexity AI · renders JS: yes · robots: yes

Social preview

LinkedInBot

LinkedIn (Microsoft) · renders JS: no · robots: partial

Slackbot

Slack (Salesforce) · renders JS: no · robots: no

Twitterbot

X (formerly Twitter) · renders JS: no · robots: partial

facebookexternalhit

Meta · renders JS: no · robots: partial

The training / search / live taxonomy

The most important single thing in this catalog: major AI vendors operate three separate bots — training, search index, and live retrieval — with three distinct robots.txt tokens. Most "Block AI Bots" WAF presets do not split them. A site that wants AI shopping visibility but not to be training data should block the training-tier UAs and allow the live-retrieval and search-index tiers.

VendorTrainingSearch indexLive retrieval
OpenAIGPTBotOAI-SearchBotChatGPT-User
AnthropicClaudeBotClaude-SearchBotClaude-User
PerplexityPerplexityBot(combined)Perplexity-User
MetaMeta-ExternalAgent(combined)Meta-ExternalFetcher
AppleApplebot-Extended (opt-out)ApplebotApplebot
GoogleGoogle-Extended (opt-out)Googlebotn/a