Bytespider
| Vendor | ByteDance |
| Type | AI training crawler (also feeds TikTok / Toutiao / Doubao search) |
| robots.txt token | Bytespider |
| JavaScript rendering | No |
| Honors robots.txt | Partial — historically aggressive, some respect added in 2024 |
| Vendor contact | spider-feedback@bytedance.com |
User-Agent string
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)
Purpose
ByteDance's general-purpose crawler. Originally built for Toutiao's news index and TikTok's content discovery; now also collects training data for ByteDance's Doubao/Lark LLM and other AI products.
Quirks
- Aggressive crawl patterns. Multiple webmasters report Bytespider
generating millions of requests per day from a small IP cluster (historically 110.249.x.x, 111.225.x.x — these change).
- Robots.txt compliance has historically been inconsistent. Many
sites that wanted to block it have ended up enforcing at the server level rather than relying on the robots.txt directive.
- Does not render JavaScript.
How to allow / block
User-agent: Bytespider
Disallow: /
If Disallow: is ignored, add a server-level rule:
# nginx example
if ($http_user_agent ~* "Bytespider|Bytedance") {
return 403;
}