AI shopping queries answer themselves from whatever HTML your server returns first — and most JavaScript apps return an empty shell. PrerenderProxy is the open, edge-hosted prerender layer that gives every bot the same fully-formed page your customers see. Built for the post-deprecation, AI-crawler era.
Raw data, screenshots, methodology, per-site cards → audit/2026-05-ecommerce-100
PrerenderProxy runs at the edge of your existing CDN. It detects every legitimate search and AI crawler — Google, Bing, GPTBot, ChatGPT-User, ClaudeBot, Claude-User, Perplexity, Applebot — and serves them a Puppeteer-rendered HTML snapshot of the same page your customers see. Your SPA stays exactly as it is. Your bot story converges to one HTML response.
Reverse-DNS + IP-range validation against every vendor's published bot list. UA-only matching is a footgun — we don't ship it.
The rendered HTML is exactly what your hydrated SPA would show — no special pleading, no drift. Compatible with Google's 2024 guidance and AI crawlers' 2026 reality.
Snapshots warm proactively from your sitemap and live in your existing Fastly or Cloudflare cache. Crawlers get edge-cache-speed responses on hit; origin load drops; no SSR migration required.
Block training bots, allow live-retrieval bots, allow search-index bots — independently, per UA, by reverse-DNS. The right policy for AI shopping visibility in 2026.
Every render emits a structured event. Hit rate, error rate, render time, and per-bot success — straight into Elasticsearch / Grafana / whatever you already run.
All config is plain JSON + VCL + Markdown. No bespoke DSL, no opaque admin UI. Designed for iterative AI-led development: any change is one PR, one diff, one human review.
A request hits your existing CDN. Crawlers are routed to a Puppeteer service that runs your real JavaScript; everyone else gets your SPA shell as normal. Both paths produce the same content — they just produce it in different places.
VCL classifies the request: real user → SPA path; verified bot → prerender path. Verification is rDNS + IP-range, never UA-only.
Cache hit → edge-cache response, no compute. Cache miss → Puppeteer renders the page using your real SPA + JS bundle, then caches the HTML for next time.
The bot gets the same DOM your customers would see after hydration — title, meta, JSON-LD, Product / Offer schema, full body — in one fetch.
A nightly crawler keeps the snapshot fresh against your sitemap. Every render emits a metric line. Drift between bot HTML and user HTML is alarmed, not tolerated.
Google deprecated dynamic rendering in 2024 on the assumption that everyone would migrate to SSR. Most didn't — and AI crawlers showed up that don't execute JavaScript at all. The technique came back under new names. We just write down honestly what the trade-offs are.
When the CMS can't ship modern SEO + AI metadata, the prerender layer can. Per-bot response shaping with the cloaking-risk safety rails worked out.
OBSERVABILITYStructured-log schema, Vector → ES → Combot pipeline, five anomaly alerts with thresholds, GDPR PII handling.
ENGINEERINGnginx, Cloudflare, Fastly, Vercel, AWS, Apache + OpenResty Lua. The three-step protocol, the vendor IP-range JSON sources, the operational tips most posts skip.
POSITIONThe block-training framework was written for publishers. For brands, the math inverts. The parametric-recall mechanism, the 2026 numbers, the strongest counter-argument addressed.
DEEP DIVEGoogle deprecated it. The AI crawlers brought it back. A history of a technique that refused to die — with the 2027 forecast.
RESEARCHThe single biggest finding from our May 2026 audit: most large e-commerce sites have voluntarily made themselves invisible to AI shopping.
DECISIONTraining vs live retrieval vs search index. They're not the same thing. How to write a robots.txt that blocks the first and welcomes the second.
BEHIND THE SCENESWhy we chose Puppeteer over Playwright, what Fastly VCL lets us do that Cloudflare doesn't, and the operating cost of a self-hosted prerender layer.
We send our full 100-site audit dataset (every site × every bot, raw HTML, screenshots) and a one-page benchmark of your own site against the cohort. No pricing pages yet — we're picking the first ten partners by hand.