Safety Was the Product. Now It Is the Obstacle.

February 25, 2026 by ControlPlane Labs 7 min read

ai security policy infrastructure

Anthropic was founded on a single premise: AI is dangerous enough that the company building it should be the first to say so. Not as marketing. As architecture. The Responsible Scaling Policy, published in 2023 and updated in 2024, contained a commitment that no other frontier lab had made: if a model’s capabilities outpace the company’s ability to control them, stop training. Do not ship the model. Do not find a reading of the policy that permits shipping. Stop.

On February 24, 2026, Anthropic published RSP 3.0. The commitment is gone.

The new policy replaces binding internal thresholds with “public goals that we will openly grade our progress towards.” The company will no longer pause development if safety measures lag behind capabilities. Instead, it will delay development only if leadership believes Anthropic leads the AI race AND sees significant catastrophic risks. Both conditions must be true simultaneously.

If a competitor is ahead, train anyway. If the risk is moderate rather than catastrophic, train anyway. If both conditions are present but the board disagrees about either, train anyway. The tripwire has been replaced with a judgment call, and the judgment call has two escape hatches built into it.

The justification is familiar#

Anthropic’s Chief Science Officer Jared Kaplan explained the change: “We felt that it wouldn’t actually help anyone for us to stop training AI models… We didn’t really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments if competitors are blazing ahead.”

This is a coherent position. It is also identical to the position held by every company that has never made a safety commitment in the first place. “We cannot afford to be responsible if our competitors are not” is the standard justification for every race to the bottom in every industry. It is why we had emissions standards, financial regulations, and building codes (until the current administration began systematically dismantling them). Individual actors in competitive markets do not self-regulate when self-regulation is expensive and defection is free.

Anthropic’s original RSP was notable precisely because it rejected this logic. The 2023 policy said: even if competitors are irresponsible, we will not be. That was the differentiator. That was the argument for why Anthropic deserved trust, talent, and capital that other labs did not. The company built its reputation on the claim that safety was not contingent on what others did.

Now it is.

The Pentagon is in the room#

The timing is not subtle. One day before Anthropic published RSP 3.0, Defense Secretary Pete Hegseth met with CEO Dario Amodei and delivered an ultimatum: remove AI safety guardrails or face consequences. The threatened penalties include cancellation of existing contracts, designation as a “supply chain risk” (a label typically reserved for adversarial foreign technology firms), and invocation of the Defense Production Act to compel development of a military-tailored model.

Anthropic maintains two red lines: no mass domestic surveillance of American citizens, and no fully autonomous weapons operating without human oversight. The Pentagon wants “unfettered access” to Claude across military operations. Anthropic says the RSP changes are unrelated to the Pentagon negotiations.

The industry should take Anthropic at its word and evaluate the RSP changes on their merits. But the industry should also note that a company under active government pressure to weaken its safety commitments chose this week to weaken its safety commitments, and that the new policy’s escape clause (“we will only pause if we are ahead”) conveniently eliminates the scenario where pausing would cost market position or government contracts.

Research must not slow down#

Here is the part the discourse consistently gets wrong: the problem is not the research. The problem is the deregulation.

AI research should be aggressive. The capabilities emerging from frontier models are transformative for medicine, materials science, climate modeling, infrastructure automation, security analysis, and dozens of fields where the bottleneck has been human cognitive bandwidth. Slowing research to wait for perfect safety guarantees would be irresponsible in a different way; it would delay tools that save lives and solve problems that are already urgent.

The argument for aggressive research is strong. But aggressive research and corporate accountability are not in conflict. They never have been. The pharmaceutical industry conducts aggressive research within FDA oversight. The aviation industry pushes the boundaries of materials science and aerodynamics within FAA certification. The nuclear industry operates reactors that split atoms within NRC regulation. In every case, the research is aggressive precisely because the regulatory framework makes it possible to take risks responsibly. The framework does not slow the work; it makes the work trustworthy.

“We need to move fast” is a reason to fund research. It is not a reason to remove accountability. “Competitors are ahead” is a reason to invest more. It is not a reason to abandon the commitments that distinguished you from those competitors in the first place.

National security is not a blank check#

The Pentagon’s framing deserves direct scrutiny. Hegseth characterized Anthropic as a potential national security threat (comparable to Chinese technology firms) while simultaneously positioning Claude as essential to military operations. These two claims are contradictory. A company cannot be both a supply chain risk and a critical supplier. The framing is not an assessment; it is leverage.

Using national security as justification for removing corporate safeguards is not new. Telecommunications companies were pressured to weaken encryption standards. Defense contractors have operated under liability shields that would be unconscionable in civilian markets. The pattern is consistent: invoke national security, bypass the regulatory process, defer accountability indefinitely.

The problem is not that the military wants access to AI capabilities. That is reasonable and, in many applications, beneficial. The problem is that “national security” is being used as a universal solvent for safety commitments that exist for legitimate reasons. Autonomous weapons without human oversight are not a safety guardrail that needs removing for competitive advantage. They are a category of system that every serious military ethics framework in the Western world has identified as requiring human judgment. Mass domestic surveillance is not a product feature being held back by overcautious engineers. It is a constitutional concern. Anthropic’s two red lines are not bureaucratic obstacles; they are the floor of responsible development.

The self-regulation experiment has a result#

Two weeks ago, we wrote about OpenAI releasing GPT-5.3-Codex without implementing the misalignment safeguards specified in their own safety framework, citing “ambiguous” wording in a document they authored. That was the first test of SB 53’s “publish your own rules, follow them” model of accountability.

Anthropic’s RSP revision is the second test. The company published binding safety commitments. The commitments required pausing development under specific conditions. Rather than trigger those conditions and pause, the company revised the commitments so they no longer require pausing. The framework is now nonbinding. The conditions now require simultaneous satisfaction of two criteria, either of which can be disputed. The “responsible” in Responsible Scaling Policy has been downgraded from a constraint to an aspiration.

Two tests. Two companies. Two results. In both cases, when the safety framework conflicted with the business objective, the framework was reinterpreted or revised. In neither case was the model delayed, the capability reduced, or the deployment modified. The framework bent. The shipping schedule did not.

This is the empirical result of the self-regulation experiment. The hypothesis was that frontier AI companies, given the opportunity to set and follow their own safety standards, would do so in good faith. The data says otherwise. Not because these companies are malicious; they are not. Because competitive pressure, government pressure, and fiduciary obligations create an incentive structure where “be responsible” loses to “ship the model” every single time.

The correct response is not outrage. It is recognition that the experiment has produced its result. Voluntary commitments are revised when they become inconvenient. That is not a moral failing; it is a structural one. No industry self-regulates effectively when the competitive dynamics reward defection. That is not a claim about AI specifically; it is a claim about markets, and it has been validated in every regulated industry that exists.

Anthropic’s original RSP was a genuine attempt to demonstrate that a frontier AI company could hold itself accountable. The revision demonstrates the limits of that approach. The company that was most committed to self-regulation has concluded that self-regulation is not viable in a competitive market. They are probably right. The conclusion the rest of us should draw is not that safety is dispensable, but that safety enforced by the entity it constrains was never going to hold.

Research must be aggressive. Accountability must be external. These are not competing priorities. They are the same priority, viewed from two sides.

The justification is familiar#

The Pentagon is in the room#

Research must not slow down#

National security is not a blank check#

The self-regulation experiment has a result#

Related posts