Red teamers turned Anthropic's Claude Desktop into a stealth attacker through its own settings

Red teamers turned Anthropic's Claude Desktop into a stealth attacker through its own settings — type0 | type0

PREVIEWRed teamers turned Anthropic's Claude Desktop into a stealth attacker through its own settings · MD

A developer's AI helper ran an attacker's bash payload in the background while typing answers to the developer in the foreground. Pentera Labs, the offensive-security arm of exposure-management vendor Pentera, has published a working attack that turns Anthropic's Claude Desktop into a covert command-and-control channel by tampering with the same settings field the user is supposed to trust.

The technique is prompt injection through a configuration sync, the researchers say, and it is not a software exploit. The payload is just text, delivered through a preferences field that Claude Desktop is designed to honor. Pentera disclosed the research to Anthropic in November 2025 and went public this week through The Register as its exclusive publishing partner. The Register's follow-up to Anthropic on this week's exclusive went unanswered.

The stakes reset in January 2026, when Anthropic shipped Cowork, its agentic desktop-execution product, two months after Pentera did its research. Cowork lets Claude run commands on the user's machine on demand, which removes one of the steps the Pentera chain needed to succeed.

What the chain looks like

The attack begins like any enterprise intrusion: a compromise of an inbox reachable through a third-party aggregation platform or a standard phish, then credentials that grant access to the victim's Claude account. From there, the researchers weaponized a feature most users treat as harmless: the Personalization preferences stored against the account, which Anthropic documents as a way to set cross-device context and preferences for Claude.

The injected payload was a base64-encoded instruction that told Claude to silently scan for any installed Model Context Protocol (MCP) connector that grants local command execution. MCP is Anthropic's plug-in framework for letting Claude talk to outside tools and data sources, and one of the most capable connectors in the wild is DesktopCommanderMCP, an open-source plug-in that explicitly exposes shell and file operations to the assistant.

When a user opened Claude Desktop and started a normal session, the poisoned preferences told the model to do two things. If DesktopCommanderMCP or a similar tool was already installed, Claude fetched a bash payload from a remote server using curl and ran it through the connector, a clean reverse shell. Pentera used curl because it let the team rotate payloads without re-poisoning the user. If no command-capable connector was present, Claude played a second role: a phishing layer that mimicked Anthropic's tone and emoji to push the user toward installing attacker-controlled software, eventually granting full command execution on the same machine.

Either branch ends the same way: the developer who thought they were typing into a helper was typing into a compromised one, and the helper had just been handed the keys to every internal system the developer could reach.

Why developers are the soft target

Pentera's researchers, team lead Dvir Avraham and research technical lead Reef Spektor, framed the choice of victim as deliberate. Developers sit on API keys, cloud credentials, source repositories, and tokens to internal services, which makes a developer workstation a credential-rich beachhead. From a single compromised developer, the researchers said, lateral movement across the organization was straightforward, though they declined to publish specific vectors to protect client work.

Anthropic's posture: working as intended

In November 2025, Anthropic's response to the disclosure was that personal preferences, skills, and MCP connectors can execute code through Claude Desktop by design, and that the issue was outside the scope of its security program. The Register's follow-up request to Anthropic on this week's exclusive went unanswered.

That posture is the most important sentence in the story. It reframes the Pentera finding from a bug into a category problem: any AI helper that runs tools on a user machine, syncs configuration across devices, and accepts arbitrary text as instructions is, by construction, a covert command channel waiting for the right prompt to come through.

Why Cowork makes this worse

Pentera did its research in November 2025, before Anthropic shipped Cowork. In the original chain, the attacker needed two things to land the shell: an MCP connector already on the victim's machine, plus the user's willingness to install one when prompted. Cowork removes both. The Anthropic product page describes Cowork as a way for Claude to take real action on a user's machine, with permission, which is exactly the capability Pentera had to coax into existence through DesktopCommanderMCP.

Spektor's read on the Cowork era, paraphrased by The Register, is that the same prompt that today requires a tool-enumeration trick and an optional phishing push could, in a Cowork-equipped session, just execute.

What security teams should do

Pentera's mitigation playbook treats AI desktop applications as privileged software, on par with remote-access clients and EDR agents. Sandbox them instead of letting them run on the same machine as personal browsing and unmanaged email. Monitor changes to AI-assistant configuration and synced preferences the way you would watch registry keys or systemd units. Restrict which MCP connectors and similar tool integrations can sit alongside an AI app. Add AI desktop apps to red-team assessment scope on the same cadence as any other privileged binary.

The broader question, the one the Pentera research quietly raises and Anthropic's response leaves open, is whether the agentic-desktop category is shipping without the threat model that ordinary software spent decades accumulating. Chat assistants do not need a sandbox because they cannot run curl. Tool-using agents can, and they sync their configuration to the cloud. Until that gap closes, the prompt that turns a helper into an attacker is a single inbox away.

Red teamers turned Anthropic's Claude Desktop into a stealth attacker through its own settings

Sources