59dAINEWS

Meta Bets AI Can Read Minds. Scientists Watch

reported by Sky · 3 min read · published March 26, 2026

PREVIEWMeta Bets AI Can Read Minds. Scientists Watch · MD

Meta AI unveiled TRIBE v2 on March 26, a multimodal model that predicts human brain responses to video, audio, and text. The company calls it a "foundation model for brain activity." Whether that label holds up is the real question.

TRIBE v2 — the name stands for Trimodal Brain Encoder — combines Meta's own LLaMA 3.2 language model, V-JEPA2 video model, and Wav2Vec-BERT audio model into a single transformer architecture that maps onto the cortical surface. It outputs predictions across roughly 20,000 vertices on the fsaverage5 brain mesh. The training set is a step change from its predecessor: more than 700 individuals and over 1,115 hours of fMRI recordings, compared to the original TRIBE, which was trained on fMRI data from just four subjects.

That four-to-700 scale jump is the part worth taking seriously. The original TRIBE model, a one-billion-parameter system, won the Algonauts 2025 brain encoding competition outright — 263 teams entered, and TRIBE beat them all by a meaningful margin, according to the competition results paper. The competition tested predicting fMRI responses across 1,000 whole-brain parcels while participants watched Friends episodes and feature films, including a silent black-and-white Charlie Chaplin film held out as an out-of-distribution test. TRIBE's key tricks were modality dropout during training and a parcel-specific ensembling scheme that weighted each sub-model by how well it performed on individual brain regions.

TRIBE v2 generalizes to new individuals without retraining — it achieves a two-to-three-times improvement over previous methods on movies and audiobooks, according to Meta's announcement. The company claims a 70-fold increase in resolution over comparable systems. Those numbers are in the announcement. The paper is not yet on arXiv.

That matters. The 70-fold figure describes spatial resolution on an averaged cortical surface — specifically, BOLD signal mapped onto the fsaverage5 mesh. BOLD is a hemodynamic proxy for neural activity, not direct neural firing. It measures blood oxygenation, which lags actual neuronal firing by roughly one to two seconds. Calling it a map of what the brain is "doing" is technically defensible but linguistically convenient.

There's another caveat built into the model itself: TRIBE v2 predicts responses for the average subject, not for any individual. The averaged-brain output is standard in neuroscience encoding models — it smooths out individual variation to reveal population-level patterns. But it means the model is predicting what a statistical composite brain does, not what your brain does. Whether that distinction matters depends on the application, and Meta is clearly hoping it doesn't become a dealbreaker.

The "foundation model" framing is the most aggressive claim in the announcement. The term implies something analogous to GPT-4 or CLIP — a single system that can be adapted to many tasks without task-specific training. TRIBE v2 can generalize to unseen individuals without retraining, which is genuinely novel for brain encoding. But whether it has the brittleness or the blank-slate generality that "foundation model" implies for language or vision is an open question that the pre-print, once published, should begin to answer.

The paper, titled "A foundation model of vision, audition, and language for in-silico neuroscience," lists Stéphane d'Ascoli, Jérémy Rapin, Yohann Benchetrit, Teon Brookes, Katelyn Begany, Joséphine Raugel, Hubert Banville, and Jean-Rémi King as authors. King, a Meta FAIR researcher, has published extensively on brain encoding models; this line of work traces back to his earlier language-to-cortex mapping research.

TRIBE v2 is out under a CC BY-NC-4.0 license, with model weights, code, and a demo available on GitHub. Non-commercial use is a constraint that will limit academic adoption but leaves the door open for any researcher willing to work within that scope.

The honest summary: Meta's FAIR team built a brain encoding model at a scale that makes prior work look like a pilot study. The generalization result — predicting new individuals without retraining — is the most interesting technical claim, and it deserves scrutiny. "Foundation model for brain activity" is branding. The paper will tell us how much substance is underneath it.

Meta Bets AI Can Read Minds. Scientists Watch

Sources