Anthropic's agents can now take the wheel on your Mac. The company announced March 23 that Claude Code and Claude Cowork, its AI coding and collaboration tools, can autonomously navigate a user's desktop — opening files, running applications, filling web forms, and executing developer workflows, even when the subscriber is away. It's a research preview, macOS only, limited to Pro and Max tier customers. But the announcement itself isn't the whole story.
Two weeks earlier, in mid-February, Anthropic quietly acquired Vercept, a Seattle startup that had spent roughly two years building exactly this kind of computer-use technology. The deal, reported by TechCrunch, brought in at least three of Vercept's co-founders — Kiana Ehsani, Luca Weihs, and Ross Girshick — and effectively ended Vercepts's independent existence. The startup's product shuttered March 25, two days after Anthropic's announcement dropped. One Vercept co-founder, Matt Deitke, had already departed for Meta's Superintelligence Lab last year, where he reportedly negotiated a $250 million compensation package — the kind of number that signals how competitive the market for this specific talent has become.
Anthropic is candid about how the capability works. When a user enables computer use, Claude checks first for a direct connector — an integration with a specific application. If one exists, it uses that. If not, it falls back to controlling Chrome through a browser interface. Only as a last resort does it interact directly with the screen, moving the mouse and keyboard like a macro, according to VentureBeat's testing. The priority system matters because direct screen interaction is the most brittle — any UI change can break a workflow that a connector or browser abstraction would survive. AppleInsider notes that this fallback mechanism involves Claude essentially mimicking human user input patterns to navigate the system.
The benchmark story is striking. On OSWorld, a widely-used evaluation for AI systems performing computer tasks, Anthropic's Sonnet models went from under 15 percent in late 2024 — when it first released computer use — to 72.5 percent with Sonnet 4.6. The human baseline on the same task sits at roughly 72 percent. That's a genuine leap, and the kind of number that makes researchers do a double-take. But benchmarks and real-world use are different things. Early hands-on testing found the feature works about half the time, performs reliably for information retrieval and summarization, and struggles with more complex, multi-step workflows that require bouncing between applications. VentureBeat called it roughly a 50/50 proposition for anything non-trivial. That gap between benchmark performance and live reliability is worth holding onto.
The timing is also competitive context worth noting. OpenAI is currently in advanced talks to raise approximately $4 billion from a consortium including TPG, Bain Capital, Advent International, and Brookfield Asset Management, at a pre-money valuation of roughly $10 billion, according to Reuters. The turf war Anthropic and OpenAI are fighting isn't just about model quality — it's about which company can build the most capable agent infrastructure, the kind that actually completes tasks on behalf of users rather than describing how to do them. Computer use is a concrete data point in that race.
Anthropic has been building through acquisition before. The company acquired Bun, the JavaScript runtime startup, in December 2025 to help scale Claude Code, which subsequently crossed a $1 billion usage milestone. Vercept fits the same pattern: Anthropic isn't building this capability from scratch in-house, it's acquiring teams that already solved the hard parts and integrating them fast. Whether that produces durable advantages or just adds integration complexity is an open question.
There are guardrails. Anthropic trains Claude to avoid stock trading, inputting sensitive data, and gathering facial images through computer use. The company is direct that these are trained behaviors, not absolute restrictions — the system is not secure against adversarial prompting, and the research preview label is doing real work here. This is explicitly not a finished product.
What happens next is a test of whether the integration engine can close the gap between 72.5 percent on OSWorld and the 50 percent real-world success rate. That's the number to watch.