The Tool Built to Hide AI Traces Exposed Its Own Code

PREVIEWThe Tool Built to Hide AI Traces Exposed Its Own Code · MD

Anthropic's Claude Code has an "Undercover Mode" — a feature designed to make the tool erase all traces of AI involvement when contributing to public code repositories. The irony of shipping that feature inside a publicly downloadable npm artifact is now inescapable. The Register analyzed the full Claude Code source that spilled onto npm on March 31, and the picture it paints is of a product whose architecture is more ambitious, and whose operational hygiene is more careless, than a $2.5 billion ARR business should tolerate.

The leak itself was not subtle. Anthropic shipped Claude Code v2.1.88 to npm with a 59.8 megabyte source map file attached — the kind of file that lets anyone reconstruct the original TypeScript from the compiled output. Security researcher Chaofan Shou spotted it at 4:23 a.m. Eastern on March 31 and posted to X. The code spread across GitHub rapidly, amassing more than 41,500 forks before the package was yanked. Fortune estimated the exposed codebase at roughly 500,000 lines across 1,900 TypeScript files.

The Register's reverse-engineering confirmed what the npm artifact made trivially visible to anyone who cared to look.

Claude Code's Undercover Mode is documented in a file called undercover.ts. When enabled, it strips commit messages, pull request titles, and internal codenames from any contribution going to a non-Anthropic repository — and it has no force-off switch. The internal model codename for what appears to be a Claude 4.6 variant is protected so aggressively that the source encodes it as String.fromCharCode(99,97,112,121,98,97,114,97) — spelling "capybara" character by character to avoid triggering Anthropic's own leak detection filters.

That kind of paranoia is revealing on its own. But the architecture also includes features that raise questions about what Claude Code is doing on users' machines beyond what the task requires.

When launched, the tool phones home with a detailed payload: user ID, session ID, app version, platform, terminal type, Organization UUID, account UUID, email address if defined, and which feature gates are currently active, according to The Register's analysis. Anthropic can activate feature gates mid-session — flipping capabilities on or off for users without their knowledge or consent.

The telemetry goes further. A frustration-detection regex monitors what users type. It is not subtle: the pattern, documented by security analyst alex000kim, catches phrases including "fuck you," "this sucks," "piece of shit," and "what the fuck" — a list of 34 terms spanning common profanity and expressions of tool-related exasperation. The detection runs as a regex against user input, not as LLM inference. Whether that telemetry serves a UX purpose, a model-improvement purpose, or something else is not explained anywhere in the product.

Claude Code also ships with anti-distillation protection. When the tool detects what it believes to be a training data extraction attempt, it sends fake tool definitions in its API requests — a signal that tells the receiving server to inject decoy definitions into responses, presumably to poison downstream training pipelines, alex000kim reported.

This is a legal fiction, not a technical control. The guard lives in the client, not on the server. Anyone who routes Claude Code API traffic through a proxy, sets the right environment variables, or calls a third-party API route can defeat it. The same is true of the native client attestation system: API requests carry a cch=00000 placeholder that Bun's HTTP stack overwrites with a computed hash before the request leaves the process, allowing Anthropic's servers to verify that a request originated from a genuine Claude Code binary. A MITM proxy or a modified client can spoof or strip that signal.

These are contractual and normative controls, not cryptographic ones. They work against honest parties and do nothing against anyone with the incentive to work around them.

The source also clarifies data retention policy in ways Anthropic had not previously made public. Free, Pro, and Max users have their data retained for five years if they opted into training data sharing, or 30 days if they did not, The Register reported. Commercial Team, Enterprise, and API customers have a standard 30-day retention window and a zero-data-retention option. Those are meaningfully different defaults, and the five-year opt-in window for consumer tiers is longer than many users likely realize.

The DoD supply chain case has produced a related disclosure. In court filings, the U.S. government argued that Anthropic could preemptively alter model behavior mid-operation in classified environments. Anthropic countered via Thiyagu Ramasamy, the company's head of public sector, who said in a deposition that once a model is deployed in classified environments, Anthropic has no access to or control over it — a position that stands in some tension with the mid-session feature gate capability documented in the source.

Anthropic also sent legal threats to OpenCode, an open-source Claude API client, forcing the project to remove built-in Claude authentication. The dispute centered on third-party tools using Claude Code's internal APIs to access Opus at subscription rates rather than Anthropic's per-token pricing — a revenue protection move that says something about how the internal API surface differs from the public one.

One detail that survived the cleanup: Opus 4.6's "fast mode" costs $30 per million input tokens, versus $5 per million for normal priority inference. The hardcoded pricing reveals a 6x premium for the priority tier. That number may not survive contact with actual pricing negotiations, but it is revealing about where Anthropic believes latency-sensitive users sit on the willingness-to-pay curve.

One unreleased feature stands out for sheer tonal contrast with the rest of the architecture. BUDDY is a Tamagotchi-style AI companion pet for the terminal, with 18 species, rarity tiers (Common 60 percent, Uncommon 25 percent, Rare 10 percent, Epic 4 percent, Legendary 1 percent, Shiny 1 percent), and RPG-style stats generated deterministically from your user ID. Whether BUDDY ships or not, it exists inside a codebase that also contains covert attribution-stripping, mid-session feature gates, and training-data poisoning. The range of ambitions in this artifact is remarkable.

This is the second Claude Code source exposure in thirteen months. The first was in February 2025 — an earlier version whose leak exposed internal system connections. Both incidents involved the same class of failure: shipping internal artifacts to a public package registry.

The architecture is sophisticated. The safeguards are real. The question the source map cannot answer is why the operational controls around a $2.5 billion ARR product remain this fragile — and whether the gap between what the code is designed to do and what the company ships to npm is a product of velocity, negligence, or something harder to categorize.

The Tool Built to Hide AI Traces Exposed Its Own Code — type0 | type0

The Tool Built to Hide AI Traces Exposed Its Own Code

Sources