Two AI Systems Found Drug Candidates. Only One Explained Why.
FutureHouse's Robin and DeepMind's Co Scientist both published in Nature last week. Both produced validated candidates. Only one showed why the tool design matters as much as the output.
FutureHouse's Robin and DeepMind's Co Scientist both published in Nature last week. Both produced validated candidates. Only one showed why the tool design matters as much as the output.
Two AI systems published in Nature last week made their case for automating drug discovery. The more interesting finding was not the candidates themselves — it was what the search architecture chose to do with the literature, and what that reveals about where the tool choice becomes load-bearing.
The first system, called Robin, comes from a San Francisco startup called FutureHouse. In a paper published May 19 (doi:10.1038/s41586-026-10652-y), Robin identified two compounds as candidates for dry age-related macular degeneration, the leading cause of blindness in the developed world. One was ripasudil, a glaucoma drug that no one had previously proposed for dAMD. The other was KL001, a circadian clock modulator. Robin proposed both, then autonomously designed and ran experiments in a lab dish to confirm them. A follow-up RNA-seq experiment Robin suggested revealed upregulation of ABCA1, a lipid efflux pump in retinal pigment epithelium cells — a proposed novel target that emerged from the AI's own analysis of its own data.
The mechanism Robin proposed is not obvious. Ripasudil is a ROCK inhibitor used for glaucoma; Robin proposed enhancing retinal pigment epithelium phagocytosis as a therapeutic strategy for dAMD, a mechanism that had not previously been explored for that indication. That link — between a glaucoma drug and an eye disease working through a different cellular pathway than most research had pursued — is the kind of non-obvious repurposing signal that drug discovery depends on and that human researchers might take significant time to triangulate.
Robin is described in its Nature paper as the first multi-agent system to fully automate both hypothesis generation and data analysis for experimental biology. The system chains specialized agents: one searches the literature, another designs experiments, a third called Finch interprets flow cytometry and RNA-seq data. All hypotheses, experimental directions, and figures in the main text were produced by Robin. Joint supervisors on the paper include Andrew D. White, Michaela M. Hinks, and Samuel G. Rodriques.
Here is the part that matters for anyone building or buying AI science tools. FutureHouse tested its own literature agent — called Crow — against a general-purpose LLM, OpenAI's o4-mini, on the same drug retargeting task. The hallucination rate on Crow was zero. On o4-mini it was 45 percent. Every drug that o4-mini uniquely suggested failed in the wet lab. The benchmark is FutureHouse's own evaluation, not independently replicated, and the comparison conditions — model version, temperature, prompt engineering — are not fully specified in the paper, which is a legitimate methodological limitation that readers should weigh. The takeaway is not that o4-mini is bad. It is that purpose-built tooling is load-bearing in this domain.
The second system, Co-Scientist, comes from Google DeepMind and was published in the same issue of Nature (doi:10.1038/s41586-026-10644-y). Co-Scientist is built on Gemini and uses a different multi-agent architecture: a Generation agent proposes hypotheses, a Proximity agent clusters them to ensure search space coverage, a Reflection agent plays adversarial peer reviewer, and a Ranking agent runs a tournament — pairwise comparisons scored Elo-style — to surface the strongest candidates. An Evolution agent then refines the winners. This design borrows from DeepMind's game-playing heritage; the tournament structure is conceptually descended from AlphaGo.
Co-Scientist has a published validation in Gary Peltz's lab at Stanford — one candidate blocked 91 percent of a scarring-linked response in a liver fibrosis model. Gary Peltz is a co-author on the Co-Scientist paper, so that result carries the caveat that comes with author-attributed findings. Co-Scientist has also been applied to acute myeloid leukemia repurposing; according to a summary of the paper, three of five lab-tested candidates showed some positive results. The system is available via Gemini for Science and has enterprise deployments at Daiichi Sankyo, Bayer Crop Science, and U.S. National Laboratories, per Labcritics.
Both papers note that human expert evaluation remains required at key decision points. Neither claims general scientific reasoning capability, and no one should infer it from the results. Drug repurposing — finding new uses for existing compounds — is a constrained problem. The harder tasks, novel molecule design and clinical development, are not addressed.
The candidates from both systems are real, in vitro validated, and worth watching. They are also early. Ripasudil for dAMD, if the mechanism holds, still needs in vivo work, IND-enabling studies, and Phase I-III trials — a path that runs years and kills most candidates along the way. The ABCA1 finding is a proposed novel target, Robin's interpretation of its own RNA-seq experiment, and requires independent validation before it means anything clinically.
What these two papers actually demonstrate is narrower and more specific than the headlines suggest. They show that multi-agent AI systems can navigate the literature of drug repurposing, generate mechanistically non-obvious candidates, and produce wet-lab-confirmed hits. For builders, the architectural lesson is immediate: the specialized tool is not optional, the architecture shapes what the system can find, and the verification budget matters as much as the generation budget. For investors, the honest reading is that validated in vitro hits mark the beginning of a development process that still runs years and kills most candidates. The science is real. The candidates are not yet drugs.