The Genome Is Not a Blueprint. AI Is Learning That the Hard Way.
Genomic foundation models trained on DNA sequence have mapped vast stretches of the genome. Leading biologists say they are still missing what matters most.
Genomic foundation models trained on DNA sequence have mapped vast stretches of the genome. Leading biologists say they are still missing what matters most.
We have sequenced the entire human genome. We still cannot explain how a single human cell works — and now AI trained on that sequence is bumping into the same hard problem.
Since its molecular structure was deduced in the 1950s, DNA has been billed by many biologists as the secret of life. They have read and studied the information stored in genomes and claimed that this genetic database must be some kind of blueprint, code script, or computer. But if DNA really does harbor some greater secret about how life works, biologists have yet to find it. In fact, the human genome is less a script than a puzzle that gets harder the closer they look.
The international Human Genome Project between 1990 and 2003 sequenced roughly 3 billion chemical building blocks of human DNA — and showed that barely 2% of the human genome consists of actual protein-coding genes Quanta Magazine explainer by Philip Ball, 2026-06-18, "Why the Human Genome's Tangled Physicality May Confound AI". Understanding regulation — how those genes are switched on and off across hundreds of cell types — turned out to be the open problem. And that problem is now confronting a new generation of AI.
Genomic "foundation models" such as Evo 2, Genos, and Google DeepMind's AlphaGenome are trained on vast quantities of genomic data Quanta Magazine explainer by Philip Ball, 2026-06-18. Biologists use them to predict how differences in DNA sequence affect biological processes and, ultimately, the traits and disease risks of a whole organism. These algorithms sidestep the complicated regulatory stuff — the three-dimensional physical architecture of the genome — and work directly from sequence alone. On sequence-based benchmarks, they perform remarkably well.
But gene regulation, biologists increasingly emphasize, is not purely a sequence problem.
"The genes are probably not the most interesting part of the genome," said Karen Adelman, a molecular biologist at Harvard Medical School Quanta Magazine, 2026-06-18. "What she said was that the genome is not static — it is living." That living quality — the physical, contextual, cell-specific regulation that takes place in three-dimensional space — is precisely what sequence-trained AI struggles to capture.
Humans have roughly 20,000 protein-coding genes and up to millions of regulatory elements called enhancers Quanta Magazine, 2026-06-18. Distant enhancers can sit millions of nucleotides away from the genes they control and are brought to those genes via loops extruded by a protein complex called cohesin — a physical process that folds the chromatin into precise three-dimensional shapes. These shapes, known as topologically associating domains (TADs) and loops, vary from cell type to cell type and even from moment to moment.
"The big gap is in the complexity of the human body — in all the cell types and how they change over time in development, and all that data is missing," said Wendy Bickmore, a geneticist at the University of Edinburgh Quanta Magazine, 2026-06-18. "I am sure AlphaGenome is going to be useful, but with limitations. I do not know" how algorithms like it will ever capture contextual regulation.
The combinatorial logic of gene regulation adds another layer. In bacteria, transcription factors tend to operate with relatively simple OR-style logic — a factor can often switch a gene on by itself. In complex eukaryotes like humans, transcription factors act combinatorially, requiring multiple factors to be present simultaneously in an AND-style arrangement Karen Adelman, Harvard Medical School, as paraphrased in Quanta Magazine, 2026-06-18. No single factor is sufficient; context is everything.
Chromatin itself is not a fixed scaffold. It forms fluid, cell-to-cell variable transcription hubs or condensates rather than deterministic machines Wendy Bickmore, University of Edinburgh, in Quanta Magazine, 2026-06-18. Epigenetic marks — chemical modifications to DNA and the histone proteins around which DNA is wound — dynamically control which sequences are accessible to the transcription machinery. This is physical, spatial, and temporal regulation that exists outside the sequence alone.
Biologists have reached for computational metaphors to describe the genome for decades — blueprint, code, computer program. The metaphor powered a generation of research and justified the massive investment in genome sequencing. But the limits of that framing are now becoming apparent.
"The genome is not static — it is living," Adelman said Quanta Magazine, 2026-06-18. The phrase captures a growing consensus among regulatory biologists: sequence is necessary but not sufficient.
Some researchers are trying to articulate what else is needed. Biologist Adrian Woolfson of Genyro has proposed the term "informiome" to describe the broader information cloud surrounding the genome — the structural, contextual, and spatial factors that determine what genes do Adrian Woolfson, Genyro, as paraphrased in Quanta Magazine, 2026-06-18. In his book On the Future of Species (April 2026), Woolfson argued that genome sequence alone cannot predict the consequences of mutations — a view that, if correct, implies fundamental limits for any purely sequence-trained model.
The warning is not new. Barbara McClintock, who won the Nobel Prize in 1983 for her discovery of transposable genetic elements, called the genome a "highly sensitive organ of the cell" in her Nobel lecture Barbara McClintock, Nobel Lecture, 1983, as quoted in Quanta Magazine, 2026-06-18. Biological historian Evelyn Fox, writing in 2020, described it as an "exquisitely sensitive reactive system" Evelyn Fox, 2020, as framed in Quanta Magazine, 2026-06-18. Both characterizations suggest a genome that is dynamic, contextual, and irreducibly physical — not a code waiting to be cracked by pattern recognition.
The new genomic AI models are remarkable tools. They have found patterns in sequence that human analysts could not. But if the core argument of regulatory biology is correct — that gene regulation depends on 3D chromatin organization, cell-type-specific context, and dynamic chromatin states that are not encoded in sequence alone — then these models face a structural ceiling that more data and larger architectures may not break through.
What comes next is an open question. Some researchers are pushing toward models that incorporate chromatin accessibility data, 3D genome structure, and single-cell epigenetic profiles. Others argue that the informiome concept — the broader information context surrounding DNA — needs a new kind of computational framework entirely. The biology suggests that whatever comes next will need to be, in McClintock's phrase, alive.