A New AI Model Predicts Which Genes Tell Cells What to Become
For most of the past decade, computational biologists had to choose between two different maps of a cell. One showed where a cell was heading — its trajectory through development, the state it was becoming. The other showed which genes were controlling the journey — the switches and circuits telling the genome what to do. Scientists could not easily use both maps at the same time.
The reason was not a technical accident. Cellular dynamics and gene regulatory networks had evolved as separate fields, with separate tools, separate conferences, and separate practitioners. Developmental biologists trying to understand how a stem cell becomes a neuron or a skin cell used one map or the other. Neither gave the full picture.
A new model called RegVelo, published May 11 in Cell00457-5), is an attempt to draw both maps simultaneously. The approach: take a snapshot of which genes are active inside a cell, then use deep learning to infer both the trajectory the cell is traveling and the regulatory interactions most likely responsible for that trajectory. The two modeling traditions — trajectory and mechanism — that have largely operated in parallel for a decade are now running on the same system.
The paper comes from a collaboration across the Stowers Institute for Medical Research, Helmholtz Munich, the Technical University of Munich, the University of Oxford, and NYU Grossman School of Medicine. The senior authors include Fabian J. Theis of Helmholtz Munich and Tatjana Sauka-Spengler at Stowers.
In practical terms: RegVelo generates hypotheses about which transcription factors are causally responsible for a cell's trajectory, and it does so in silico — on a computer, not a bench. That frees experimental effort from brute-force screens toward targeted validation. A lab that would have spent six months on a focused genetic screen might get there in weeks.
The proof is in the living tissue. The team used RegVelo to generate predictions about gene regulatory circuits in the zebrafish neural crest — a population of cells that migrate through the embryo and give rise to pigmentation, peripheral neurons, and cartilage. They then tested those predictions with CRISPR/Cas9 knockouts and single-cell Perturb-seq experiments in actual zebrafish embryos.
The model identified tfec as an early driver of pigment cell formation and elf1 as a previously unknown regulator of pigment fate in neural crest cells. Both predictions were confirmed by the CRISPR experiments — the computationally predicted regulatory relationship held when the genes were disrupted in a living organism. This is the step that separates RegVelo from the many computational models that generate phase portraits and nothing else.
"You can imagine, if you had a very early set of cells, being equipped with a particular set of instructions could allow you to reproduce, in vitro, some of these cell types in a very natural way," Sauka-Spengler told press. "These cells could then be used in cell therapies in regenerative medicine."
The more commercially interesting implication is in drug discovery and regenerative medicine. Identifying which transcription factors are causally responsible for a developmental trajectory — as opposed to correlating with it — is precisely the kind of input that pharmaceutical researchers use to prioritize genetic targets for small molecule or cell therapy programs. RegVelo does not prove any of those targets are druggable, but it narrows the search space in a way that brute-force experimental approaches cannot. For a field that has been separately cataloging cell states and regulatory components for two decades, that integration is the point.
RNA velocity inference — the backbone of RegVelo's trajectory modeling — is an indirect method. It infers future cell states from the ratio of precursor to mature RNA transcripts, which serves as a proxy for transcriptional dynamics. That proxy works well in systems with clear transcriptional dynamics and enough cells sampled densely. It works less well in sparse data or systems where RNA splicing kinetics are unusual. Adding a regulatory network layer on top of that compounds the assumptions.
The zebrafish neural crest is also a relatively tractable validation system. It is well-studied, accessible to live imaging, and genetically tractable in ways that human tissue is not. Whether the causal regulatory relationships RegVelo identifies in zebrafish generalize to mammalian development or human cell types is unproven. The authors did not claim otherwise in the paper.
The code and model are available as specified in the paper, and the paper describes in silico counterfactual inference capabilities that allow researchers to ask what happens if a specific regulatory interaction is perturbed. Whether the broader community adopts and validates RegVelo on different biological systems will determine whether this is a useful specific tool or a general advance in gene regulatory network inference.
This is not a paradigm shift. It is a genuine step forward in a specific technical problem — joint inference of cellular dynamics and gene regulatory networks — that has real consequences for how hypothesis-driven developmental biology gets done. The CRISPR validation in zebrafish is the part that makes it more than a curve-fitting exercise. Whether the same approach earns that distinction in human cell types is the question RegVelo leaves open.