AI Found Drug Candidates. Now Comes the Hard Part.
Google DeepMind and a nonprofit called FutureHouse both published in Nature this week. Their AI systems do not discover drugs. They find candidates. There is a difference, and it is becoming the most important distinction in biomedical research.
The two papers describe multi-agent AI systems — FutureHouse's Robin and DeepMind's Co-Scientist — that generate, debate, and rank hypotheses for drug discovery. Both produced candidates that survived human expert filtering. Robin found a glaucoma drug worth testing for age-related macular degeneration. Co-Scientist surfaced thirty candidates for a type of leukemia; oncologists narrowed them to five, three showed positive results in vitro. These are real results, peer-reviewed, published this week in Nature.
The speed claim is where the companies start making interpretive leaps. FutureHouse says Robin cut research time two hundredfold — a figure no independent researcher has verified. DeepMind does not offer a comparable number for Co-Scientist. Both systems face the same hard limit: they generate candidates at AI speed but do not run the experiments that determine whether those candidates work. Humans do those assays. That validation step — growing cells, running animals, reading outputs — is where the timeline and cost live in drug discovery. The AI has collapsed the cheap part. The expensive part remains, as Nature's own editorial notes.
This is the shift worth tracking. For most of the past decade, the bottleneck in AI drug discovery was hypothesis generation — finding a plausible mechanism, a testable target, a candidate worth trying. What these papers suggest is that the bottleneck has moved. Generating candidates is now the fast, cheap part. Testing them is the constraint. And the constraint is physical: it requires lab space, trained hands, biological samples, time. AI cannot yet do that work faster by adding more compute.
The implications for research economics are where this gets interesting. Drug development has historically been prohibitively expensive for diseases affecting small patient populations — rare disorders, niche oncology indications, conditions prevalent primarily in low-income regions. The cost of finding candidates was part of that calculus. If AI compresses candidate identification from years to days, the math on rare disease research changes. A compound already approved for one condition, with a known safety profile, becomes a repurposing candidate for another at a fraction of the traditional cost. This is not a solved problem — the validation bottleneck remains — but the economics of the front end have shifted in a way that makes previously marginal research programs viable.
The two papers represent different institutional bets on the same underlying capability. FutureHouse, a nonprofit that built Robin to automate the full loop from literature search through candidate proposal, completed the entire process — from conceptualization to paper submission — in just two and a half months with a small team. DeepMind built Co-Scientist as a partner tool available through its Hypothesis Generation platform, working alongside scientists rather than replacing the workflow. Co-Scientist uses tournament-style ranking with Elo ratings borrowed from chess ranking systems to prioritize among generated hypotheses. Robin's approach is more integrated, with a single system orchestrating the full discovery cycle. Both architectures assume human scientists are in the loop for validation; neither automates the slow, expensive part.
Neither paper compares its output against the decades of established computational methods for drug repurposing — a gap that independent analysts at ResultSense noted. Whether multi-agent language model systems outperform those specialized tools is a question the papers do not address.
The Nature editorial accompanying both papers frames them as a significant step forward while maintaining careful distance from replacement narratives. "These projects represent a significant step forwards," it reads, "but for all the wow factor, it is crucial to bear in mind that the AI systems were not working independently." That is accurate. What the editorial does not say is that the step forward is uneven — hypothesis generation has been transformed; the part that determines whether candidates are real has not.
Robin is scheduled to have its code released publicly. Co-Scientist is being distributed to research teams through DeepMind's platform. Both are real systems with real results in peer-reviewed literature. The gap between those results and a drug that reaches patients is as large as it ever was — but the start of the process, the part where researchers figure out what to test, has gotten substantially faster.