For decades, preference researchers have asked people to pick their favorite of two options. A new MIT proof shows that simple question is structurally incapable of revealing how one person's tastes relate to another's, and the fix is what MIT News editorializes as "the power of three": ask people to rank three options instead of two.
The result was presented in April at the International Conference on Learning Representations in Rio de Janeiro by a team at MIT's Laboratory for Information and Decision Systems. In their OpenReview paper, Gabriele Farina, Constantinos Daskalakis, Yeshwanth Cherapanamjeri, and Sobhan Mohammadpour prove that pairwise comparisons cannot recover the correlation structure of preferences across a population, while ordered three-way rankings can. Cherapanamjeri is now at Nanyang Technological University in Singapore. Farina is principal investigator at MIT LIDS and core faculty in the Operations Research Center. Daskalakis holds the Avanessians Professorship and is a member of CSAIL.
The proof builds on L. L. Thurstone's 1927 "A law of comparative judgment," the foundation of random utility models. Standard random-utility estimation assumes that the utilities people assign to options are independent. In real populations that assumption collapses immediately. A voter who favors gun control is far more likely to also favor childcare subsidies than the independence model would predict, and the correlation carries information about both policy positions at once. Pairwise comparisons, the team shows, are provably blind to that correlation. The information is not just hard to extract from pairs. It is not there to extract.
The team's algorithm recovers the structure from triples without an experiment count that grows exponentially with catalog size, which is the practical reason the result is more than a curiosity. Anyone who runs a stated-preference survey, an A/B test inside a recommender system, or a policy consultation that asks residents to weigh trade-offs has a concrete design lever available. The question changes from "which of these two do you prefer?" to "rank these three from most to least preferred."
The reach extends further than survey design. Daskalakis told Steve Nadis at MIT News that the framework has direct implications for the human-ranking pipelines used to align large language models. Reinforcement learning from human feedback, the technique that turns ranked human preferences into a reward signal for model fine-tuning, is at its core an exercise in fitting a random utility model to ranked outputs. Better data collection at the top of the pipeline means better alignment at the bottom. Emma Frejinger, a computer scientist at the Université de Montréal who was not involved in the paper, called the result "a crucial breakthrough" in the same article, and said it provides "a highly practical roadmap for collecting better data."
The cost is real, and it belongs in the same sentence as the guarantee. Triple comparisons take longer per respondent, are cognitively harder, and are easier to design badly. A poorly chosen triple can anchor respondents, introduce ordering effects, or simply exhaust them, and the proof does not insulate any of those failure modes. The mathematics guarantee that the information is recoverable in principle. They do not promise that a given triple-comparison survey will recover it well, and survey designers will still need to choose their stimuli, their respondent pools, and their estimation pipelines carefully.
What to watch next is whether the algorithm moves out of the ICLR paper and into off-the-shelf tooling for the preference-modeling and RLHF communities, and whether large-scale preference data collectors begin to fold triple-comparison questions into their pipelines as a routine complement to pairwise ones. The MIT team has handed the field a target. The rest of the work is still on the field.