Big Tech Arrived at Drug Discovery Together — Then Immediately Split
The way scientists learn what biology actually wants is by watching it say no.
That is the thing the AI platforms are not set up to hear. A compound that fails a binding assay, a dose-response curve that goes flat, an animal model that stops responding — these are not just negative results. They are the primary data feed for learning how the system actually works. And a new class of AI tools optimized to surface the next best candidate may be, inadvertently, training the entire discovery ecosystem to stop listening.
Last week an MSKCC team working with Amazon Web Services produced the most concrete demonstration of why that tension matters. Using AWS Bio Discovery, the group designed nearly 300,000 nanobody candidates and narrowed them to the top 100,000 for wet lab testing in weeks — compressing a process that typically takes up to a year into days. The result, reported by GEN News Monday, is a proof-of-concept for exactly what the platform vendors say they are building: a closed loop where computation feeds the lab and the lab feeds the next round of computation.
Three days ago, OpenAI added a new layer to that loop. The company released a life sciences research plugin for its Codex platform — an orchestration tool that spans human genetics, functional genomics, and clinical evidence, routing multi-step research questions to subagents that work in parallel across independent data lanes. Amgen, Moderna, and the Allen Institute are already running it against their pipelines. The plugin does not just retrieve information. It designs the experimental workflow.
That is the new shape of the problem. When AI is embedded inside the loop that designs experiments, selects candidates, and routes results, the commercial pressure to show rapid positive selection is not neutral. It is oriented toward the candidates that look good enough to show customers. The failures — the assays that did not hold, the models that went flat — are what the loop is actually built to route around, not to learn from. And that is the under-examined cost underneath a coordinated Big Tech move into life-sciences AI that became impossible to ignore when three hyperscalers arrived at the same corner of biology within eleven days this April.
Amazon, OpenAI, and Anthropic all made their most deliberate life-sciences moves yet between April 3 and April 16. What has not been fully reckoned with is what the divergence itself reveals — and what it might be costing the science underneath.
The infrastructure bet runs through Amazon. AWS already serves nineteen of the world's twenty largest pharmaceutical companies; Bio Discovery wraps those existing relationships into a unified application where computational predictions and wet-lab validation flow into each other without manual handoffs. A researcher designs candidates in silico, sends shortlisted molecules to Twist Bioscience or Ginkgo Bioworks for synthesis, and the results return to refine the next cycle. Dan Sheeran, vice president and general manager of healthcare and life science at AWS, described the bottleneck the platform is designed to attack: computational biologists who bridge AI and biology are in short supply, and the tooling does not support how teams actually need to work together.
The reasoning bet runs through OpenAI. GPT-Rosalind scored 0.751 on BixBench, a bioinformatics benchmark, beating GPT-5.4, Google Gemini 3.1 Pro, and xAI's Grok 4.2. On LABBench2, a broader research task benchmark, it outperformed GPT-5.4 on six of eleven tasks, with the largest margin on CloningQA — end-to-end design of DNA constructs and enzyme reagents. When Dyno Therapeutics tested it directly against human experts in the Codex app, best-of-ten model submissions ranked above the 95th percentile of AI-bio researchers on an RNA prediction task and around the 84th percentile on sequence generation. The new Codex plugin extends that reasoning into workflow design.
The talent bet runs through Anthropic. Coefficient Bio had been operating stealth for eight months with fewer than ten people when Anthropic bought it for $400 million. The co-founders, Samuel Stanton and Nathan Frey, both worked on computational drug discovery at Genentech's Prescient Design team — exactly the hybrid biology-and-ML expertise that neither pure model capability nor cloud infrastructure can replicate. Eric Kauderer-Abrams, Anthropic's head of biology and life sciences, put it plainly: the goal is to give scientists what software engineers have, a partner they can delegate to.
Anthropic's own Claude for Life Sciences scores 0.83 on Protocol QA, a benchmark testing understanding of laboratory protocols, against a human baseline of 0.79 and Sonnet 4's 0.74. The platform connects to Benchling, PubMed, 10x Genomics, BioRender, and other scientific tools — giving it breadth across the researcher workflow that pure inference power does not capture. And last week Anthropic published BioMysteryBench, a benchmark designed to test whether Claude can analyze real-world bioinformatics problems end-to-end rather than answering isolated questions. The model solved roughly 30 percent of previously unsolved scientific problems — a number Anthropic's own researchers describe as the model stumbling onto solutions rather than systematically reproducing them. That distinction matters when the problem you're trying to solve is an experimental failure.
What Enke Bashllari, founder and managing director at Arkitekt Ventures, wrote on LinkedIn has been circulating in the industry since the three launches landed close together: the three companies are playing different games. OpenAI is selling the sharpest reasoning engine with limited access. AWS is building infrastructure and lab integration. Anthropic is betting on breadth of workflow and making acquisitions to close the specialization gap. For startups, Bashllari wrote, the question is not which platform wins. It is which layer you build on.
Chris Leiter, founder and general partner at Atria Ventures, frames the opportunity around what he calls bioconsumerism — the idea that medicine is the use case that justifies the entire AI buildout. Public skepticism, he argues, starts to erode when the output is a drug that reaches a patient five years early, or a diagnostic that catches a cancer no doctor would have seen.
That outcome is possible. The ninety percent clinical-trial failure rate that makes drug discovery so brutal tells you another version of the same story. Even if every in silico prediction were perfect, the experimental biology that follows — synthesis, animal models, Phase I safety — is where the attrition lives. A model that generates three hundred thousand candidates in a week does not change that reality. It moves the bottleneck.
The failure rate in clinical trials is not a bug in the system — it is the system. Each failure carries information about how the biology actually behaves, information that the next round of candidates depends on. A platform optimized to maximize candidate quality may be optimized, inadvertently, for the paths that produce publishable successes, and away from the messy experimental failures that generate the most durable scientific knowledge. The Codex plugin and the MSKCC preprint and the Coefficient Bio acquisition are all real. The question is what they are teaching the loop to pay attention to.
None of these three companies is wrong to be here. The biology is real, the commercial opportunity is real, and the scientific need is real. The interesting question is not which company wins but who controls the point of integration — the moment where the prediction meets the experiment and something real is learned about whether it works.
That is where the leverage will settle. And nobody has claimed it yet.