Cloudflare has just expanded what robots.txt can do. With its new Content Signals Policy, publishers can explicitly signal how AI bots should (or shouldn’t) use their content and not just whether they can crawl it.
Cloudflare’s Content Signals Policy accomplishes that by letting site owners express usage preferences via three new signals:
search: permission for content to appear in search indexes and support standard search features
ai-input: whether AI models can treat your content as input for generating responses or summaries
ai-train: whether your content can be used to train or fine-tune AI models
Signals are declared like:
User-Agent: *
Content-Signal: search=yes, ai-train=no
Allow: /Cloudflare is also treating these signals as a reservation of rights, in legal terms, a formal expression of how you want your content to be used.
As of now, more than 3.8 million domains using Cloudflare's managed robots.txt service already default to search = yes, ai-train = no.
How It Impacts the Landscape
The Good
a) Stronger Control Over Content Usage
You can now separate your permissions for search visibility from AI usage and training. That helps in reclaiming control over how your content is consumed.
b) Legal / Rights Posture Improved
These signals strengthen your stance in asserting usage rights. Disregarding them may incur legal risk, especially as AI systems mature.
The Bad
c) Unclear Compliance, Especially From Big Players
Not all AI companies have committed to respecting these signals. Google, in particular, faces scrutiny because its crawler currently powers both search indexing and AI Overviews, making separation difficult.
d) Potential Trade-Offs
By disallowing AI input or training, you might be excluded from AI-powered answer platforms or summaries that otherwise drive awareness.
What You Can Do to Control
Define your default signals
Start with safe defaults: search = yes, ai-train = no, leave ai-input blank until you decide.
Update or implement your robots.txt
Add the Content-Signal directive. If you're on Cloudflare’s managed robots.txt, they’ll inject defaults for you.
Augment with defensive tools
Use WAFs, bot management, rate-limiting, and crawl control to back up your signals if bots ignore them.
Track bot behavior & anomalies
Review logs and monitor for “rogue” crawlers. If they violate your signals, you’ll need to block or throttle.
Revisit & update over time
As AI platforms evolve and possibly adopt respect for these signals, revisit which permissions to grant or deny.
Cloudflare’s update is a clear signal that control has become a core part of optimisation. It’s no longer just about being found, it’s about setting boundaries on how your content is used. Take a moment to refine your robots.txt, outline what’s fair game for AI systems, and ensure your crawl permissions match your visibility goals.
In today’s AI-led landscape, who accesses your content matters just as much as where you rank/cite.