The default story about always-on AI agents focuses on what they can store. A new survey of 435 papers argues that is the wrong place to look. The harder engineering problem is what those agents should be allowed to forget, recover, and hand back.
"Always-On Agents: A Survey of Persistent Memory, State, and Governance in LLM Agents", an arXiv preprint by a research team, treats always-on agents, meaning AI assistants whose behavior depends on durable state accumulated across earlier interactions, as something broader than chatbot memory. The taxonomy covers nine categories of persistent state: retrievable memories, task ledgers, permissions, credentials, commitments, provenance and audit records, shared state, trigger conditions, and externally committed effects.
Each of those items can be read along six diagnostic axes: authority, scope, mutability, provenance, recoverability, and actionability. The paper then walks the state through a lifecycle of being written, validated, organized, retrieved, acted on, updated, forgotten, audited, and sometimes rolled back. The taxonomy exists to expose how thin the literature still is on the right-hand side of that lifecycle.
The 435-paper corpus is explicitly scoped, not exhaustive, and the authors coded it themselves. Their headline finding, that research concentrates more heavily on accumulating and retrieving state than on governing, recovering, or relinquishing it, is a literature claim rather than a peer-reviewed result. It still lines up with what an outsider sees: product demos lean on recall benchmarks, while tooling for selective forgetting, time-bound credentials, or side-effect rollback stays rare.
The shift from memory to governance is more than a vocabulary change. State that an agent can write but cannot safely revoke becomes a liability. A tool credential that an agent can pick up but never expires is a foothold a future attacker could inherit. A commitment the agent records but cannot renegotiate on the user's behalf is a contract that drifts out of date without ever being reviewed. The survey's six-axes framing argues these failure modes are first-class engineering problems, not edge cases.
To make that claim testable, the paper sketches an Always-On Evaluation Protocol, or AOEP-v0, that scores agents on how they mutate and govern state. The protocol is presented as a pilot contract; nothing in the source verifies that vendors, regulators, or open-source projects have adopted it. It is, in effect, a shared rubric for asking whether an agent can forget on purpose.
A summary of the paper published on the takara.ai preprint TLDR pulls the same governance-gap framing out of the technical taxonomy. The signals worth tracking are concrete. Adoption of AOEP-style scoring outside the paper, especially in third-party benchmarks, would tell readers whether the rubric has any traction at all. A major platform publishing explicit revocation or rollback behavior, rather than treating memory as a feature to be expanded, would show the field has moved past the easy half. Independent reproduction of the governance-gap finding on a differently scoped corpus would either confirm or soften the central claim. A steady flow of arXiv work treating forget, recover, and relinquish as primary verbs rather than footnotes would shift the center of gravity.
The open question is not whether agents will keep accumulating state. They already do. It is whether the field that builds them will treat the right to forget as a serious engineering surface, or keep leaving it for later.