Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures
Kumiho: A Formal Approach to Cognitive Memory for AI Agents
A new paper published to arXiv on March 18th introduces a memory architecture for AI agents built on formal belief revision theory — the same kind of principled framework philosophers and logicians have used for decades to study how rational systems should change beliefs when new information conflicts with old ones. The system is called Kumiho, and the results are significant enough to warrant attention from anyone building agent infrastructure.
The paper, by Young Bin Park, Minsik Cho, Anton Bak, Prannaya Gupta, Jure Leskovec, William Marshall, Hae Won Park, and Siddharth Ben, makes a simple observation: the structural primitives you need for cognitive memory — immutable revisions, mutable tag pointers, typed dependency edges, URI-based addressing — are identical to the primitives you need for managing versioned software artifacts. Git has commits, branches, diffs. Kumiho has immutable memory revisions, mutable tag pointers, typed causal edges, and URI-based addressing. The paper proves a formal correspondence between the AGM belief revision framework and the operational semantics of its property graph memory system.
What belief revision theory brings to agents
The AGM framework, named after philosophers Carlos Alchourrón, Peter Gärdenfors, and David Makinson, defines how a rational agent should change beliefs when confronted with new information. Kumiho proves its graph memory system satisfies the basic AGM postulates (K2–K6) and Hanssons belief base postulates (Relevance, Core-Retainment). This means the system is not just storing facts — it is managing a belief state that updates consistently when new information arrives.
This matters for agents because the memory problem is not primarily a storage problem. It is a coherence problem. An agent that remembers that the user prefers morning meetings and also remembers that the user is an extreme night owl has a coherence problem that simple retrieval cannot solve. Kumiho provides the formal tools to reason about which belief should give way when they conflict.
The empirical results
On LoCoMo, a token-level recall benchmark, Kumiho achieves 0.565 overall F1 (n=1,986), including 97.5% adversarial refusal accuracy — meaning the system correctly refuses to answer questions based on memories introduced via prompt injection attacks.
On LoCoMo-Plus, a Level-2 benchmark testing implicit constraint recall — the ability to remember things mentioned in passing, as constraints on future actions — Kumiho achieves 93.3% judge accuracy. Gemini 2.5 Pro, the best published baseline, achieves 45.7%. That gap is not close.
The architecture
Kumiho implements a dual-store model: Redis for working memory, Neo4j for long-term graph memory, with hybrid full-text and vector retrieval. Three architectural innovations drive the results: prospective indexing (LLM-generated future-scenario implications indexed at write time), event extraction (structured causal events preserved in summaries), and client-side LLM reranking.
The architecture is model-decoupled. Switching the answer model from GPT-4o-mini to GPT-4o improves end-to-end accuracy from roughly 88% to 93.3% without any changes to the retrieval pipeline. The evaluation cost for 401 LoCoMo-Plus entries was approximately $14.
Why this matters for agent infrastructure
The memory problem is the unsolved infrastructure problem for persistent agents. Most agent frameworks treat memory as a retrieval problem — store embeddings, retrieve semantically similar content. Kumiho treats it as a belief management problem, which is a more principled frame. The formal grounding means the system satisfies the AGM postulates — a reliability guarantee that high embedding similarity score cannot provide.
This is the kind of work that tends to get absorbed into major agent frameworks within a year. The question is whether it gets absorbed as a specific implementation or as a design philosophy.
Sources: Kumiho paper on arXiv | LoCoMo benchmark (Maharana et al., 2024)