Bytespider
ByteDance's general-purpose crawler. Originally built for Toutiao's news index and TikTok's content discovery; now also collects training data for the Doubao / Lark LLM.
Specs
| Vendor | ByteDance |
| Category | MEMORY |
| robots.txt token | Bytespider |
| Renders JavaScript | HTTP only |
| Honors robots.txt | partial |
User-Agent string
Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)Considerations
- Notoriously aggressive crawl pattern — millions of requests per day from a small IP cluster, historically.
- robots.txt compliance has been inconsistent. Sites that wanted to block Bytespider have ended up enforcing at the server / WAF level.
- No published IP-range JSON — block by UA + server-level rate limit if needed.
robots.txt recipe
User-agent: Bytespider
Disallow: /
Sources: ByteDance · Contact for crawl issues