The AI That Models the Gap Between What You Believe and What You Actually Believe

PREVIEWThe AI That Models the Gap Between What You Believe and What You Actually Believe · MD

When an AI can model the gap between what you believe and what you actually believe, the interesting question is not whether the AI is intelligent. It is who is building the systems that exploit that gap.

A paper posted to arXiv on May 19 by five researchers at BRAC University in Bangladesh describes OSCToM-8B, a fine-tuned version of Meta's Llama-3.1-8B-Instruct that reaches 76 percent accuracy on FANToM, a benchmark designed to stress-test Theory of Mind reasoning under information asymmetry. The baseline it compares against — a framework called ExploreToM — scores 0.2 percent on the same test. The paper calls that a 380x improvement. The authors are Sharmin Sultana Srishty, Kazi Mahathir Rahman, Malaika Parizat Sakkhi, Samia Shahid Prianna, and Shaikhul Islam Sinat.

The method is adversarial data generation. The authors use reinforcement learning to synthesize scenarios where an observer holds one belief about a situation while simultaneously attributing a different belief to another agent. This observer-self conflict is common in real social reasoning — think of a negotiator who knows the other side is wrong but must model what the other side wrongly believes. Existing benchmarks, the paper argues, do not test for this. OSCToM does. The data synthesis procedure is also 6x more efficient than prior approaches, according to the authors.

The 0.2 percent baseline figure requires scrutiny. ExploreToM was designed as an adversarial data generation framework, not as a model built to run on FANToM. The authors are comparing a model trained specifically on OSCToM-generated data against a baseline that was not designed for this evaluation. Whether that baseline was ever meant to be run on FANToM at all is unclear from the paper alone. No independent researcher has yet replicated the 76 percent figure. ExploreToM, published at ICLR 2025 by Melanie Sclar and colleagues at Meta AI, the Allen Institute for AI, Carnegie Mellon, and the University of Washington, used a domain-specific language and A* search to generate adversarial ToM data. On its own adversarial test data, GPT-4o scored 9 percent and Llama-3.1-70B scored 0 percent.

What the paper is actually demonstrating is that targeted adversarial training data can improve performance on a specific cognitive reasoning task. OSCToM-8B has 8 billion parameters. The finding that targeted data matters more than raw scale for this specific capability is plausible and not unprecedented in the literature.

The dual-use question is where this story becomes worth telling now rather than waiting for peer review. A separate June 2025 paper from NTT researchers Tatsuhiro Aoshima and Mitsuaki Akiyama explicitly connecting Theory of Mind capability to safety evaluation frames the risk directly: as LLMs improve at tracking nested beliefs, the question becomes whether that capability is being used to model human beliefs for purposes the human did not intend. "Since 2024, there have been increasing reports of LLMs not only disabling oversight mechanisms or exercising autonomous capabilities but also displaying behaviors that appear to deceive users or developers," they write. The capability that makes OSCToM-8B good at tracking observer-self conflicts — modeling the gap between what an agent believes and what that agent believes about another agent — is the same capability required for systems designed to model, predict, and influence human belief at scale. Persuasion technology, automated negotiation systems, targeted influence operations all require exactly this. The BRAC authors did not respond to a request for comment on anticipated applications of their work.

The authors are from BRAC University, a teaching institution in Dhaka, Bangladesh. They are not affiliated with any major lab. The code is public on GitHub. The model weights are available. Nobody outside the authors has tested whether the 76 percent holds.

The ceiling of a benchmark result is not the ceiling of a capability. But the gap between how people see themselves and what they actually believe is real, measurable, and now, apparently, manipulable by an 8-billion-parameter model from a university in Bangladesh — and that is worth knowing before someone with more resources decides what to do with it.

The AI That Models the Gap Between What You Believe and What You Actually Believe — type0 | type0

The AI That Models the Gap Between What You Believe and What You Actually Believe

Sources