What it is (and why)
Markdown for Agents makes Cloudflare convert a page's HTML into clean Markdown at the edge when the client asks for it through content negotiation (Accept: text/markdown). LLM crawlers and AI agents get a roughly 90% smaller, boilerplate-free version of the page - no nav, scripts, or styling - which is cheaper to ingest and easier to parse than full HTML.
- It does not change what human browsers see; they still get HTML.
- It is a content-negotiation feature: same URL, different representation depending on the request's
Acceptheader. - The conversion happens on Cloudflare's edge, not your origin server.
Prerequisites
- The site is on Cloudflare (proxied / orange-cloud).
- Access to the Cloudflare dashboard for the zone.
- Available on current Cloudflare plans (including free and Pro) under AI Crawl Control. No origin or CMS changes are required to turn it on.
Enable it (the 30-second part)
- Cloudflare dashboard, select your domain.
- Left sidebar, AI Crawl Control, Overview.
- Toggle Markdown for Agents to On.
That is the whole setup. It is a toggle, not a form - there is no save button. The same panel also offers Managed robots.txt and Redirects for AI Training, both optional and independent of Markdown.
Verify it works - and the number-one gotcha
Test with a GET request. Do not test with curl -I (HEAD). A HEAD request has no body to convert, so Cloudflare returns the base text/html content-type even when the feature is working perfectly. This single mistake looks exactly like "the feature is broken."
# CORRECT (GET): expect content-type: text/markdown
curl -sS -D - -o /dev/null -H "Accept: text/markdown" https://example.com/ | grep -i content-type
# See the markdown body
curl -sS -H "Accept: text/markdown" https://example.com/ | head -20
# WRONG (HEAD) - will show text/html and fool you:
# curl -sI -H "Accept: text/markdown" https://example.com/
A correct response carries content-type: text/markdown; charset=utf-8 and vary: accept, with a body that starts with YAML front-matter (title/description) followed by the page content as Markdown.
The caching gotcha: Vary: Accept
If your HTML responses only send Vary: Accept-Encoding, caches (Cloudflare's and any downstream) can serve a cached HTML copy to a client that asked for Markdown, because nothing told the cache the response varies by Accept. The symptom: Markdown only appears on uncached requests, and cached pages return HTML to agents.
The fix: make sure responses include Accept in the Vary header. Easiest is a Cloudflare Response Header Transform Rule that sets a static Vary of Accept-Encoding, Accept for all requests. Or emit Vary: Accept-Encoding, Accept at the origin. Note: Cloudflare already adds Vary: accept to the Markdown response itself; this fix is about your HTML responses, so the two representations are cached and served as distinct variants.
If you also want edge caching (optional)
Caching HTML for speed while keeping Markdown working needs care, because on non-Enterprise plans Cloudflare's edge cache key can't vary on the Accept header. Emulate it with rules: a Cache Rule that makes HTML GET pages eligible for cache with an Edge TTL, but excludes Markdown and agent requests so they stay on the fresh, convertible path. Add this to the match expression:
and (not any(http.request.headers["accept"][*] contains "text/markdown"))
and (not cf.client.bot)
For dynamic CMSes (WordPress and friends), also exclude admin, login, cart, checkout paths and logged-in session cookies, or you'll cache personalized pages. WordPress sends Set-Cookie and no-cache on many responses, so you'll need to override the Edge TTL (and possibly strip Set-Cookie) for caching to engage.
Reality check: who actually consumes it
Turning it on does not mean the big LLM crawlers immediately ingest Markdown. Most major crawlers (GPTBot, ChatGPT-User, PerplexityBot, Bytespider) do not send Accept: text/markdown yet and will keep fetching HTML. In one real sample, Markdown traffic was a tiny fraction of page responses, dominated by a single niche crawler plus Googlebot. It is a forward-looking bet that improves as agents adopt the convention, not an instant win. We measured exactly this in the companion case study.
Complementary: llms.txt
Pair Markdown for Agents with an llms.txt file (a structured index of your key pages for LLMs). Tools like Yoast SEO can generate one. It is discovered by some agents independently of the Accept header.
Measuring adoption
AI Crawl Control, Overview shows AI crawler traffic and a "% of markdown requests fulfilled" stat. For precise per-request logs (which agent, when, how long), deploy a small Cloudflare Worker that logs Accept: text/markdown GETs to Workers Analytics Engine, then query via the Analytics Engine SQL API. Keep the worker a transparent passthrough so it can't disturb the conversion, and verify Markdown still works right after deploying.
Troubleshooting checklist
| Symptom | Likely cause |
|---|---|
text/html on your test | You used curl -I (HEAD). Use GET. |
| Markdown only on uncached pages | Missing Vary: Accept on HTML responses. |
| Agents get cached HTML | Cache rule isn't excluding Accept: text/markdown. |
| Big LLM bots still get HTML | They don't send Accept: text/markdown (expected). |
| Personalized page cached | Cache rule missing cookie/path exclusions. |
| Changes not visible for hours | Edge TTL; purge cache or lower TTL. |
TL;DR
- AI Crawl Control, Markdown for Agents = On.
- Verify with GET (
Accept: text/markdown), never HEAD. - Add
Vary: Acceptto HTML responses. - If caching, exclude markdown and agent requests from the cache rule.
- Manage expectations: real LLM consumption is still small and growing.