MEMORY · ByteDance

Bytespider

ByteDance's general-purpose crawler. Originally built for Toutiao's news index and TikTok's content discovery; now also collects training data for the Doubao / Lark LLM.

Specs

VendorByteDance
CategoryMEMORY
robots.txt tokenBytespider
Renders JavaScriptHTTP only
Honors robots.txtpartial

User-Agent string

Mozilla/5.0 (Linux; Android 5.0) AppleWebKit/537.36 (KHTML, like Gecko) Mobile Safari/537.36 (compatible; Bytespider; spider-feedback@bytedance.com)

Considerations

  • Notoriously aggressive crawl pattern — millions of requests per day from a small IP cluster, historically.
  • robots.txt compliance has been inconsistent. Sites that wanted to block Bytespider have ended up enforcing at the server / WAF level.
  • No published IP-range JSON — block by UA + server-level rate limit if needed.

robots.txt recipe

User-agent: Bytespider
Disallow: /

Sources: ByteDance · Contact for crawl issues

← Back to directory