For most patients with a rare disease, the first genetic test is also the last one. The sequencing data is still on a server somewhere. The catalog of gene–disease links and pathogenic DNA changes, however, keeps growing, and almost no lab goes back to
re-run each stored genome against the new evidence. The result: more than half of rare-disease patients stay undiagnosed after their first test, even though the science to read their data may now exist.
A new open-source tool called Talos, released by Microsoft Research, takes direct aim at that gap. Talos automates the reanalysis step that labs have historically run by hand, if
at all. It watches the public evidence base for new gene–disease links, re-prioritizes candidate variants, the specific DNA changes a sequencing run flags as potentially disease-causing, in each stored genome, and surfaces only a small shortlist for a clinical geneticist to inspect. In a prospective deployment on roughly 5,000 previously undiagnosed patients, the system delivered 241 new diagnoses, an
additional 5.1% diagnostic yield, at a mean of 32 days between supporting evidence going public and a flagged diagnosis, according to the Microsoft Research announcement and the /articles/s41591-026-04477-5" target="_blank" rel="noopener noreferrer" class="text-[var(--accent)] hover:underline">peer-reviewed Nature Medicine paper describing the work.
The design is built for a constraint any clinical lab recognizes: human review time. On a validation set of about 1,100 patients, Talos recovered roughly 90% of in-scope diagnoses while flagging only about 1.3 candidate variants per patient for expert inspection, the developers report. On
the iterative monthly run against the prospective cohort, the human cost shrank further, to roughly one new candidate variant per 200 patients each month, a workload a small team can absorb as a routine recurring cycle rather than a special project. An earlier medRxiv preprint
and its PMC version trace the same design choices through peer review.
That math reframes the bottleneck. The reason most stored genomes are never re-examined is not that the gene–disease evidence is missing; that evidence exists, in the form of newly published gene–disease links and curated variant databases, and
is exactly the stream Talos watches. The constraint is operational: reanalysis is laborious enough that, in most clinical workflows, it runs only when a patient returns to clinic, not on a recurring schedule. By making the cycle automatic and the per-cycle review load small, Talos turns the work into something a lab can run continuously, treating stored sequencing data as a diagnostic asset whose value can grow as
the literature grows, rather than a frozen file that loses interpretive value the day after it is generated. The full implementation is public on GitHub, with project documentation that clinical bioinformatics teams can adopt directly.
The honest caveats matter. The 5.1% additional yield comes
from the developers' own prospective cohort of roughly 5,000 patients, and the 90% recovery rate comes from their ~1,100-patient validation set; both are described in the Nature Medicine paper, but neither has been independently replicated outside the developing group. Talos is an open-source research tool
, not a regulated diagnostic device, and the source basis offers no evidence of FDA or EMA clearance or of routine deployment in non-academic clinical labs. The generalizability of the roughly one-variant-per-200-patients review cost to other settings, and to clinical workflows with different patient mixes, is an open question.
What to watch next is whether any clinical laboratory outside Microsoft Research's collaborators runs
Talos against its own unsolved cohort and reports back on throughput, false-positive rate, and reimbursement posture. That second datapoint is what turns a single-group demonstration into a workflow the field can adopt. The release makes that follow-up cheap to attempt, which is itself the constructive news.