What Is a Do Not Train Signal and How Does It Protect Your Content

Among all the assertions that can be embedded in a C2PA content manifest, "Do Not Train" is the one that gets the most attention — and for good reason. It's the mechanism that most directly addresses the question that creators, publishers, and enterprises have been asking since large language models began training on internet-scale datasets: is there any technical way to assert that my content should not be used for AI training?

The answer is yes. And the implications are more significant than most people realize.

What "Do Not Train" Actually Is

"Do Not Train" is a formal C2PA assertion — a structured data field within a C2PA manifest that explicitly states the creator's or rights holder's intent that the content not be used to train AI models. Like all C2PA assertions, it is cryptographically signed by the issuing party, making it tamper-evident and verifiable.

It is not a technical block. It does not prevent an AI system from technically accessing the content. What it does is create a verifiable, cryptographically signed record that the content carries an explicit "Do Not Train" assertion from the rights holder.

Why That Legal Distinction Matters

The difference between a technical block and a legal notice is the difference between physical security and a no-trespassing sign. Physical security prevents entry. A no-trespassing sign doesn't — but it converts trespass from an accident into a decision.

When an AI training pipeline ingests content that carries a signed "Do Not Train" assertion, that's no longer a case of inadvertent training on publicly available content. It's a documented decision to override an explicit rights signal from an identified rights holder. That changes the legal calculus significantly — it removes the "we didn't know" defense that has been central to AI labs' responses to copyright litigation.

How It Compares to Other Approaches

robots.txt: Voluntary, unverifiable, easily ignored, not content-embedded. "Do Not Train" is content-embedded, cryptographically signed, and travels with the asset regardless of where it's distributed.

Terms of service: A legal assertion without a technical record. "Do Not Train" creates both a legal assertion and a verifiable technical record simultaneously.

IP blocks: Infrastructure-level, easily circumvented, doesn't affect content already scraped. "Do Not Train" follows the content, not the server.

The Coverage Question

Like all C2PA metadata, "Do Not Train" signals are vulnerable to stripping — if metadata is removed, the signal is lost. This is why enterprise content protection pairs "Do Not Train" assertions with imperceptible watermarking: ensuring that the provenance signal survives even when metadata is stripped.

Implementation

Asserting "Do Not Train" requires a C2PA-capable signing workflow. The assertion is added to the manifest at creation or publishing, signed with the rights holder's certificate, and embedded in the file.

Limbo generates C2PA manifests with "Do Not Train" assertions as part of its standard content provenance infrastructure. Talk to us about protecting your content.