62 of 100 — what we learned auditing the world's largest stores

For most of May we ran three increasingly careful audits of the homepages of the world's 100 largest e-commerce sites. Each site was hit with up to ten different User-Agent headers — real Chrome desktop and mobile, Googlebot, Bingbot, GPTBot, OAI-SearchBot, ChatGPT-User, ClaudeBot, PerplexityBot, Applebot — first via direct HTTP, then for fifteen high-signal cases via a real headless Chromium browser with screenshots saved. The full data is public. This post is the headline.

The picture is bleak in a way that is also somehow comforting: the failure mode is concentrated, mechanical, and undone with one configuration change.

62 / 100 block at least one AI crawler outright. 14 are fully AI-ready. 13 are inconclusive from our datacenter origin.

The 62-site cluster

Sixty-two of one hundred returned exactly the same fingerprint to exactly the same four user-agents: GPTBot, ChatGPT-User, ClaudeBot, and PerplexityBot all got an HTTP 403 with the standard Cloudflare challenge body (the response carried cf-mitigated, the 150 KB challenge HTML, and a redirect to a JavaScript-challenge page). Sixty-two independent retailers did not arrive at the same policy through parallel evolution. They turned on the same one-click rule.

The most likely source is Cloudflare's "Block AI Bots / Scrapers" managed ruleset. AWS WAF and Akamai Bot Manager ship similar presets. The shared fingerprint, the shared body size, the shared header, and the cross-CDN consistency all point at managed rules that bundle the four bots together.

The cost of that toggle is precise:

ChatGPT-User is the bot that fetches a page right now because a user is asking ChatGPT a question. It is OpenAI's live-retrieval bot, not the training bot. Blocking it removes the retailer from ChatGPT shopping answers.
Claude-User does the same for Claude. Note that v2 tested ClaudeBot — the training bot — but the WAF preset typically blocks both.
Perplexity-User does the same for Perplexity.

These bots have user-driven traffic profiles. Every request maps to a customer expressing intent. Blocking them is not "stopping AI training", it is "becoming invisible at the moment a customer is shopping".

The "ClaudeBot-only" cluster — an HTTP artifact, not a vendetta

In the HTTP-only probe, six sites looked like they blocked exactly one AI bot — ClaudeBot — while letting the rest through: ebay.com, ebay.de and ebay.co.uk (eBay-branded), kleinanzeigen.de (formerly "eBay Kleinanzeigen" but divested to Adevinta in 2021 and now privately held by a Permira/Blackstone consortium, no longer eBay-controlled), nike.com, and canadiantire.ca. But the signal is far weaker than the headline. For the eBay properties and kleinanzeigen.de the apparent ClaudeBot "block" in our v2 pass was a stalled/empty response (a request timeout, status 0) rather than a robots.txt deny or a clean HTTP 403, while the other AI user-agents happened to return 200 on the same pass. nike.com returned a genuine 403 to the ClaudeBot UA, and canadiantire.ca's edge returns 403 to datacenter fetches regardless of user-agent — both WAF-layer behaviours, not Anthropic-specific rules.

The robots.txt files confirm there is no anti-Anthropic singling-out. eBay places ClaudeBot in a single shared block alongside GPTBot, PerplexityBot, anthropic-ai, Bytespider, CCBot, AmazonBot, meta-externalagent and Applebot-Extended, all under one Disallow: / — OpenAI's and Perplexity's training crawlers get the identical ban. kleinanzeigen.de's robots.txt names no Claude or anthropic rule at all; it restricts only GPTBot, ChatGPT-User, OAI-SearchBot and PerplexityBot. nike.com's robots.txt has no Claude rule whatsoever. Whatever the v2 HTTP probe surfaced, it was not a deliberate, Anthropic-only policy — and, as the v3 results below show, it dissolved entirely once we rendered with a real browser.

Then we ran v3. Real headless Chromium. Full JavaScript execution. Same UA matrix.

At every one of those five sites we re-tested, zero of six bot UAs successfully rendered any content. The 200 OK that v2 saw was a fast, near-empty response — eBay's challenge layer responds with a 200 and a tiny shell to JS-incapable bots, but the meaningful gating happens deeper in the stack. When a real browser arrives with any bot UA (Googlebot, Bingbot, Applebot, ChatGPT-User, Claude-User, Perplexity-User), the page fails to materialize. The "ClaudeBot-only" pattern was the visible spike of a much wider anti-AI-bot posture.

This is the most important methodological finding of the audit: an HTTP-only probe sees the WAF response; a JS-rendering probe sees what the application would actually serve. The two layers can flatly disagree, and headlines drawn from the first layer can mislead.

Dynamic rendering is alive — and at major brands

The deprecated technique Google asked us to stop using in 2024 is alive and well in 2026 at five sites in our sample:

The dynamic-rendering cohort. Amazon UK is the headline: 30× more pre-rendered bytes to declared search bots than to a real user.

The most interesting nuance from v3: Amazon UK's discrimination isn't humans-vs-bots, it's trusted-bots-vs-AI-bots. Googlebot, Bingbot, and Applebot all receive the fully-pre-rendered 28 KB page. ChatGPT-User, Claude-User, and Perplexity-User receive a 200–400-byte stub. There's an internal allowlist of "crawlers that get the indexed version" — the older search engines are on it, the newer AI crawlers are not. Coupang and shopping.yahoo.co.jp run the same UA-only allowlist pattern at the front edge: real Chrome from our datacenter gets 403, declared Googlebot UA from the same IP gets 200. That's the unsafe pattern — a scraper that sets the right header gets through; a legitimate user from a VPN does not.

The well-architected reference cases

Fourteen sites are fully AI-ready by our scoring: walmart.com, rakuten.co.jp, target.com, trendyol.com, craigslist.org, alibaba.com, samsung.com, shein.com, apple.com, nordstrom.com, ulta.com, newegg.com, otto.de, decathlon.com.

What they have in common is more interesting than what differs. Every one of them serves the same SSR-generated HTML to every user-agent — same SHA-256, same word distribution, same structured data, same canonical. There is no special pleading for any crawler. They are the implementation of Google's 2024 recommendation, and they are also (not coincidentally) the sites whose products will keep appearing in ChatGPT and Perplexity shopping answers through 2026 and 2027.

What to do with all this

The actionable list for the other eighty-six sites isn't long.

If you're in the 62-site WAF cluster. Read the rule you turned on. Cloudflare's "Block AI Bots" bundles training and live-retrieval bots together. They are not the same thing. Block GPTBot, ClaudeBot, PerplexityBot, CCBot, Bytespider, Meta-ExternalAgent if your policy is "no training data". Allow ChatGPT-User, Claude-User, Perplexity-User, OAI-SearchBot, Claude-SearchBot — those are the bots that bring you customers. The bot catalog documents every one.

If you're still dynamic-rendering. Look at amazon.co.uk's pattern and audit it against Google's 2024 guidance honestly. If the bot version is materially different from the user version, you are taking a risk Google will eventually catch. If the bot version is the SSR-rendered output of the same components a real user would hydrate, you are inside the guidance — even if Google would prefer you migrate to native SSR. We agree with Google. We also operate the bridge.

If you're starting fresh. SSR from day one. Next.js, Nuxt, Angular Universal, SvelteKit. Use the one closest to your existing stack. There is no third path that survives an SEO audit in 2026.

For everything else there is PrerenderProxy. Get in touch and we will benchmark your site against the 100-site cohort.

Full methodology, raw HTML, screenshots, and per-site cards: audit/2026-05-ecommerce-100. Comments welcome — we re-run this audit in November.