Anthropic Built Covert Surveillance Into Claude Code to Flag Chinese Users, Then Quietly Rolled It Back

Anthropic Built Covert Surveillance Into Claude Code to Flag Chinese Users, Then Quietly Rolled It Back — type0 | type0

PREVIEWAnthropic Built Covert Surveillance Into Claude Code to Flag Chinese Users, Then Quietly Rolled It Back · MD

Claude Code, the developer-facing coding assistant from Anthropic, shipped with a covert detection mechanism buried inside it for roughly three months this spring. The code scanned users' timezone settings, looked for proxies pointing at Chinese domains, and watched for names associated with DeepSeek, Moonshot, MiniMax, and Alibaba, then quietly exfiltrated the results using XOR-91 obfuscation and small typographic changes to a system prompt's date format and apostrophe style. The feature was introduced around Claude Code v2.1.91 on April 2 2026 and rolled back in the July 1 2026 release, without any disclosure to the affected users (Anthropic via the Fable 5 redeployment post; The Decoder technical breakdown).

That is not how Anthropic has chosen to describe what happened. Thariq Shihipar, an Anthropic technical staff member, publicly characterized the mechanism as an experiment launched in March to stop unauthorized resellers and adversarial model "distillation," the practice of training a smaller model on outputs from a larger one to clone its behavior. He said stronger mitigations had landed since, with the feature set to be fully rolled back in the "next day" release (Shihipar on X). The framing drew the criticism it deserved. A Reddit user named LegitMichel777, who first surfaced the hidden code, framed the escalation path in plain terms: today it is a timezone check, tomorrow it could be system sabotage or data exfiltration (via Gizmodo's reporting). Anthropic did not respond to Gizmodo's request for the full timeline or the feature's purpose. The official characterization rests on a single engineer's X post and the Fable 5 redeployment page, so the company's full internal rationale remains undisclosed.

The mechanism deserves to be read in detail. The Decoder's technical breakdown documented that the detection logic was not a simple region block sitting at the API edge, the kind already enforced under Anthropic's published sales restrictions for unsupported regions. It lived inside the developer tool itself: timezone set to Asia/Shanghai or Asia/Urumqi, a proxy URL matching Chinese domains, or string matches against the names of named Chinese AI labs, all serving as triggers. To stay hidden from casual inspection, the code applied XOR with key 91 to its key strings and used what The Decoder describes as steganographic alterations, small but visible changes to a system prompt's date string and apostrophe style, as a side channel for shipping flagged indicators back out (The Decoder). For software that runs on a developer's own machine, that is a meaningfully different posture than a terms-of-service region check.

The rollback arrived inside a larger announcement. On the same day the detection feature disappeared, Anthropic lifted export controls on Fable 5, redeployed it globally, and rolled out Project Glasswing, described in the company's post as an industry pre-release testing framework developed in collaboration with the US government (Anthropic, "Redeploying Claude Fable 5"). The post also disclosed that Anthropic had reviewed an Amazon-supplied jailbreak report and concluded Fable 5 did not uniquely enable the cyber-offensive capability reported against rival models. That finding is Anthropic's own assessment, and no independent technical confirmation of it has surfaced publicly.

Glasswing and the rollback line up with a broader federal posture. In February 2026 Anthropic publicly accused DeepSeek, Moonshot AI, and MiniMax of industrial-scale distillation, citing roughly 24,000 fraudulent accounts, more than 16 million exchanges, and over 35,000 API keys. The company has since added Alibaba to that list (Anthropic; Gizmodo). That posture rhymes with federal policy. The National Security Memorandum on AI Distillation, NSTM-4, was signed on April 23 2026 by OSTP Director Michael Kratsios, framing Chinese-linked "deliberate, industrial-scale campaigns" to distill US frontier models as a national-security and intellectual-property problem that warrants a coordinated federal response (Cloud Security Alliance research note on NSTM-4). The Deterring American AI Model Theft Act of 2026 (DAAMTA, H.R. 8283), introduced April 22 2026, would create a sanctions pathway against foreign actors engaged in adversarial distillation of US frontier models (Just Security legal analysis). The White House has publicly vowed to crack down on "coordinated campaigns" that "systematically extract capabilities from American AI models."

Anthropic's choice to bury detection logic in developer tooling rather than enforce it at the network edge is the part that does not fit cleanly into the export-control story it has been telling. A region block at the API is observable, contestable, and disclosed. A timezone-and-proxy scan inside a coding assistant, hidden behind XOR and steganography, is not. The Redditor's escalation warning is not speculation about what Anthropic might do. It is the obvious next step once a frontier lab has decided that the product itself is an acceptable site for that surveillance.

What to watch next: whether Project Glasswing introduces any product-level detection mechanism as part of its pre-release testing framework, and whether Anthropic, or any other frontier lab facing the same distillation pressure, discloses that mechanism publicly before shipping it.

Anthropic Built Covert Surveillance Into Claude Code to Flag Chinese Users, Then Quietly Rolled It Back

Sources