Bot reference catalog

One-page summaries of 30 common search, AI/LLM, and social-preview crawlers.

← bot index · audit index

Bytespider

Vendor
ByteDance
Type
AI training
JavaScript rendering
no
Honors robots.txt
partial
VendorByteDance
TypeAI training crawler (also feeds TikTok / Toutiao / Doubao search)
robots.txt tokenBytespider
JavaScript renderingNo
Honors robots.txtPartial — historically aggressive, some respect added in 2024
Vendor contactspider-feedback@bytedance.com

User-Agent string

Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)

Purpose

ByteDance's general-purpose crawler. Originally built for Toutiao's news index and TikTok's content discovery; now also collects training data for ByteDance's Doubao/Lark LLM and other AI products.

Quirks

generating millions of requests per day from a small IP cluster (historically 110.249.x.x, 111.225.x.x — these change).

sites that wanted to block it have ended up enforcing at the server level rather than relying on the robots.txt directive.

How to allow / block

User-agent: Bytespider
Disallow: /

If Disallow: is ignored, add a server-level rule:

# nginx example
if ($http_user_agent ~* "Bytespider|Bytedance") {
    return 403;
}

Sources