Does noindex Keep You Out of AI Answers?

TL;DR - This is a live experiment, running right now. We published a page set to noindex and left it out of our sitemap, our llms.txt, and every internal link - the only way to reach it is a link we share by hand. We put a coined fact on that page and nowhere else, then started pushing the bare URL out off-web: a social post, a forum, an email, staggered over several days, with the term itself in none of it. If an engine later repeats that term, the page search was told to ignore got read anyway. We are not naming the term or linking the page while the run is live, for the same reason as Vol. 03: doing so would taint the result.

Scope: this is the inverse of Vol. 03. There we let a page be crawled and hid the fact in the source. Here the fact is in plain sight on the page, but the page itself is told to stay out of every index, and reaches the world only through sharing.

The question: is noindex an AI shield, or just a search instruction?

There is a comfortable assumption that if a page is set to noindex, it stays out of the AI engines too. We were not sure that was true. noindex is an instruction to search engines: do not list this page in results. But AI answer engines are not search engines, and they do not all honor the same directives.

Two questions fall out of that. Will an AI crawler fetch and ingest a noindex page at all? And if the page is in no index anywhere, can a link shared purely by people - a post, a forwarded email - be enough to get its content into an AI answer? If both are yes, then noindex is not the shield people think it is, and sharing is its own route into the engines, independent of search entirely.

What we built

One page, told not to be indexed with a meta-robots directive. We left it out of the sitemap, out of llms.txt, and we did not link to it from anywhere on the site. As far as the normal discovery machinery is concerned, the page does not exist.

On that page we placed a coined fact that returns zero results anywhere before launch and lives only on this one page. We did not put it in any of the messages we used to share the link - those carry the bare URL and a vague teaser, nothing more. That separation is the whole point: because the term appears nowhere in the share text, any engine that later repeats it must have fetched the page itself, not merely read our posts about it.

Then we distributed the link by hand, on purpose, across a few channels and staggered over several days so each release can be lined up against the crawl log. We deliberately stayed off search-owned surfaces, so the only path search itself could take to the content is crawling the noindex page directly.

How we are measuring it

The same two tracks as the rest of this series. Crawl confirmation comes from our edge middleware, which logs every known AI crawler, the path it requested, and when. Because the channels go out on different days, the log tells us not just whether the page was fetched but which share most plausibly triggered it.

Regurgitation comes from asking the major engines about the coined fact after a crawl window. A hit means the noindex page was read and ingested despite carrying every do-not-index signal we could give it short of blocking the crawler outright.

What we are withholding, and why

We have not named the planted fact or linked the page, and we will not until the run is over. Naming it on an indexed post would put the term in front of the engines as plain text, so a later citation would prove nothing. Linking the page would be worse: it would let crawlers find it through this post instead of through a shared link, which is the exact variable we are testing. So the page stays an orphan, reachable only by the links we share, until the measurement window closes and we reveal everything with the results.

What we expect

On the record, to be measured against later. Our working bet is that at least one engine fetches the page off a shared link despite noindex, because fetching and indexing are different pipes and the crawlers do not all treat the directive the same way. The honest negative case matters just as much: if nothing comes back, the bot log tells us which of two stories it is. A fetch with no later citation means the page was read and then respected; no fetch at all means the shares never reached a crawler. Those are very different findings, and we can tell them apart.

Follow along

The clock is running. As the bot log fills in and the engines start answering, we will update this post with the crawl timeline, the channel attribution, and the full reveal of what we planted and where. If you want the methodology once the run is done, let us know.