The question: is noindex an AI shield, or just a search instruction?
There is a comfortable assumption that if a page is set to noindex, it stays out of the AI engines too. We were not sure that was true. noindex is an instruction to search engines: do not list this page in results. But AI answer engines are not search engines, and they do not all honor the same directives.
Two questions fall out of that. Will an AI crawler fetch and ingest a noindex page at all? And if the page is in no index anywhere, can a link shared purely by people - a post, a forwarded email - be enough to get its content into an AI answer? If both are yes, then noindex is not the shield people think it is, and sharing is its own route into the engines, independent of search entirely.
What we built
One page, told not to be indexed with a meta-robots directive. We left it out of the sitemap, out of llms.txt, and we did not link to it from anywhere on the site. As far as the normal discovery machinery is concerned, the page does not exist.
On that page we placed a coined fact that returns zero results anywhere before launch and lives only on this one page. We did not put it in any of the messages we used to share the link - those carry the bare URL and a vague teaser, nothing more. That separation is the whole point: because the term appears nowhere in the share text, any engine that later repeats it must have fetched the page itself, not merely read our posts about it.
Then we distributed the link by hand, on purpose, across a few channels and staggered over several days so each release can be lined up against the crawl log. We deliberately stayed off search-owned surfaces, so the only path search itself could take to the content is crawling the noindex page directly.
How we are measuring it
The same two tracks as the rest of this series. Crawl confirmation comes from our edge middleware, which logs every known AI crawler, the path it requested, and when. Because the channels go out on different days, the log tells us not just whether the page was fetched but which share most plausibly triggered it.
Regurgitation comes from asking the major engines about the coined fact after a crawl window. A hit means the noindex page was read and ingested despite carrying every do-not-index signal we could give it short of blocking the crawler outright.
What we are withholding, and why
We have not named the planted fact or linked the page, and we will not until the run is over. Naming it on an indexed post would put the term in front of the engines as plain text, so a later citation would prove nothing. Linking the page would be worse: it would let crawlers find it through this post instead of through a shared link, which is the exact variable we are testing. So the page stays an orphan, reachable only by the links we share, until the measurement window closes and we reveal everything with the results.
What we expect
On the record, to be measured against later. Our working bet is that at least one engine fetches the page off a shared link despite noindex, because fetching and indexing are different pipes and the crawlers do not all treat the directive the same way. The honest negative case matters just as much: if nothing comes back, the bot log tells us which of two stories it is. A fetch with no later citation means the page was read and then respected; no fetch at all means the shares never reached a crawler. Those are very different findings, and we can tell them apart.
Follow along
The clock is running. As the bot log fills in and the engines start answering, we will update this post with the crawl timeline, the channel attribution, and the full reveal of what we planted and where. If you want the methodology once the run is done, let us know.