The Government Is No Longer Auditing Frontier AI. It Is Buying It.
The last time the US government tried to evaluate frontier AI models for national security risks, it had the right tool for the job: an independent agency called the US AI Safety Institute, designed to test models before deployment and tell the public what it found.
That agency no longer exists.
In June 2025, the AISI was renamed the Center for AI Standards and Innovation, absorbed into NIST, and its mission was quietly rewritten. The new CAISI would "evaluate and enhance US innovation of these rapidly developing commercial AI systems," in the words of Commerce Secretary Howard Lutnick — a shift documented in the agency restructure. The entity that once existed to scrutinize frontier AI from the outside now sits inside the commercial tent it was meant to hold at arm's length.
That reframing matters more than any single agreement. And it is the context that makes two developments from the past week worth reading together.
The first is a NIST report published May 2, in which CAISI evaluated DeepSeek V4 Pro — the most capable Chinese AI model to date — and found it lags behind US frontier models by roughly eight months. The second is a series of classified AI agreements the Pentagon announced on Friday, striking deals with OpenAI, Google, Microsoft, Amazon, Nvidia, xAI, and the startup Reflection. Together, these agreements are worth up to $200 million each — a floor, not a ceiling, for what federal buyers are prepared to spend on agentic AI systems that can operate in classified environments without constant human direction.
These are separate events. Different agencies, different agreements. But they share a logic. The US government is no longer acting as an external auditor of frontier AI. It is buying the sector and rewriting the terms of access.
The clearest enforcement action in this new market is what happened to Anthropic. The company had a $200 million deal with the Pentagon to handle classified materials. It refused to loosen its self-declared red lines around mass domestic surveillance and fully autonomous weapons. The Defense Department banned Anthropic's products from federal networks, declaring it a supply chain risk. Anthropic sued and won a temporary injunction. And yet Emil Michael, the DoD's chief technology officer, was in the same announcement calling Anthropic's Mythos model — the one the Pentagon has blacklisted — a "separate national security moment," noting its capabilities for finding and patching cyber vulnerabilities. The message to every lab in the room is structural: government access is conditional on compliance, and compliance means letting the customer define the guardrails.
Google's agreement illustrates the terms. Under the deal, Google must "help in adjusting the company's AI safety settings and filters at the government's request." The contract carves out mass surveillance and autonomous weapons without human oversight — on paper. What it does not do is give Google the right to veto lawful government operational decisions. The guardrails exist. The government's override power exists too.
The procurement infrastructure is being built to make this routine. In March 2026, the GSA and NIST signed an MOU embedding CAISI's measurement science inside USAi — the secure federal AI platform that already serves approximately 15 agencies as of early April. Through USAi's Console, agencies can run standardized test suites, compare models side by side, and export evaluation data for audits. CAISI is not just certifying outcomes ex-post. It is embedded in the procurement workflow. Separately, CAISI's AI Agent Standards Initiative, launched in February, is positioning voluntary technical standards as the pathway for industry-led protocols — in effect, writing the rulebook for what government-compatible AI looks like.
What "adjusting safety settings" means in practice varies by provider and is not disclosed. What is disclosed — in the Pentagon's announcement — is the operational purpose: establishing "the United States military as an AI-first fighting force." That is not an evaluation mandate. That is a procurement specification.
The enforcement precedent is the market signal. Labs that comply gain classified network access and a federal procurement relationship worth up to $200 million per agreement, with pipeline signals extending across a $32 billion federal AI and cybersecurity contracting base. Labs that refuse — like Anthropic — get blacklisted, even when their most capable model is simultaneously being evaluated by the same agency that certified their competitors.
The eight-month China gap CAISI published in its DeepSeek evaluation reads differently in this context. It is not a warning about competitive risk. It is a technical baseline confirming the US models the government just granted classified access to remain ahead of the field. Whether that confirmation survives contact with models running on classified networks — under agreements that give the government modification rights — is a question nobody outside government will be positioned to answer.