The promise of large language models has always been more than fluent text. A position paper accepted to ICML 2026 argues the field is now hitting a wall that more compute cannot break through, and the way forward looks less like a bigger model and more like a borrowed organ.
The paper, Position: Hippocampal Explicit Memory Is the Cornerstone for AGI by Sangjun Park, draws a direct line from human memory neuroscience to the architecture of today's language models. Its claim is structural: the way an LLM learns from data is functionally analogous to the way humans acquire implicit memory, the unconscious kind that lets a child recognize a face or ride a bike. That is enough to produce convincing language. It is not enough, the paper argues, to produce the kind of cognition that most people actually mean when they say "AGI."
Park defines that target as "Human-Level AI": the general ability to learn, reason, and apply knowledge across cognitive tasks and domains. The interesting move is not the goal itself. It is the diagnosis. Park contends that higher-order functions, including long-term strategic planning, metacognition, symbolic reasoning, dynamic learning, and reflection, cannot emerge from implicit statistical pattern learning alone. They require explicit memory: the conscious, deliberately retrievable kind that the human hippocampus supports.
This framing matters because it relocates the AGI debate. The standard scaling story says current systems need more data and more parameters. Park's argument is that even a perfectly scaled implicit learner will plateau at the same ceiling, because the architecture is missing a mechanism the brain evolved for a reason. That is a testable claim, and the paper does more than assert it.
Section 6 of the preprint lays out formal computational requirements for an artificial explicit memory system. The list is the part of the paper that has practical value for researchers building memory-augmented models, and it is the part the field can argue with. Among the functions Park specifies are long-horizon planning, dynamic information integration without full retraining, reflective reasoning about the model's own state, and symbolic operations in a way statistical next-token prediction cannot achieve.
The paper also offers a vivid anchor for the gap. Figure 1, reproduced in the arXiv HTML version, sets up an analogy that drives the whole argument: an LLM can be made to produce the trace of a calculation like 17 × 6 = 102, step by step, the way a human reasoning aloud would. But a human solving the same problem on an abacus is not reasoning about the multiplication. The abacus is doing the work, and the abacus is external. The LLM, Park argues, is in the same position. It is producing a plausible "reasoning" trace by indexing patterns in its training data, not by retrieving the result from a structured memory of the kind of facts being manipulated. The trace looks like thinking. The substrate is the same kind of implicit retrieval that powers the abacus operator's hand.
This is where the paper's position becomes most contestable, and where the field's existing work matters most. Retrieval-augmented generation, memory-augmented transformers, and episodic memory modules each approximate parts of what Park is describing. They add a store of facts the model can pull from at inference time, or a mechanism to update that store on the fly. None of them, the paper argues, meets the full checklist in Section 6, especially on metacognition and long-horizon planning. The argument is not that no one is trying. It is that what people are trying does not yet reach the bar Park is setting.
Critics have several legitimate lines of attack. The hippocampus analogy is loose: explicit memory in neuroscience is a system with many components and edge cases, and reducing it to a checklist of computational functions risks losing the biology. "Explicit memory" itself admits many possible implementations, and the paper does not foreclose the question of which one would actually deliver the capabilities Park claims are missing. AGI timelines are contested, and tying the architecture to a contested definition invites the kind of debate that does not resolve on evidence. The field has not converged on hippocampal-style memory as the right direction, and a single-author position paper is a starting point for that conversation, not a conclusion.
What the paper does provide is a stress test. If a memory-augmented LLM can demonstrate long-horizon planning, dynamic information use, and reflective reasoning that current systems cannot, the position gains weight. If those properties keep emerging from scale and clever training alone, the diagnosis has to be revised. Park has, in effect, offered a research program with falsifiable milestones. The next year of work on memory-augmented models will be measured against it.
Park is a single author. The paper is cross-listed in q-bio.NC (Neurons and Cognition), which signals that the analogy is being made by someone engaging seriously with the neuroscience side. The empirical examples of explicit-memory absence in current LLMs, gathered in Appendix C of the preprint, are the section most likely to be useful to readers who want concrete failure modes rather than abstract argument.
The bet, then, is not that LLMs should be made to imitate a brain. It is that the things people want LLMs to do, and cannot yet do, are exactly the things the brain uses explicit memory to do. The paper is a researcher's way of asking the field to take that bet seriously enough to test.