A pricing team is ready to ship a new tier. Before they commit, they want signal: how would customers actually respond? A vendor offers to skip the fieldwork and run the question past an AI-simulated panel of synthetic respondents. The panel is "80% accurate." Should the team trust it?
That is the question Burke, Inc., a Cincinnati-based decision intelligence consultancy, tried to answer with a study and a new framework released on 2026-06-15. The headline finding cuts against the usual sales pitch for synthetic research. LLM-based synthetic panels cleared the commonly cited 80% accuracy bar, yet in Burke's testing they produced false conclusions in roughly 60% of business decision scenarios. Methods Burke describes as generative data models grounded in validated human respondent data did substantially better in the same comparison.
Two phrases in that paragraph do most of the work. "Synthetic panels" means AI-simulated survey respondents used in place of human ones. "Decision-grade" means good enough to actually drive a business call, not just to look plausible on a slide. Burke's new framework, called FAR, scores synthetic data on three dimensions: Fidelity, Authenticity, and Resolution. It is one vendor's proposal for how to judge whether a panel is worth betting a decision on, not an industry standard.
The point of the study, in Burke's own framing, is that accuracy is the wrong purchasing criterion. As Eli Moore, Burke's SVP of Strategic Growth, put it in the release, the question is "whether it leads to the same conclusions you would reach by talking to your customer." A panel can hit a headline accuracy number and still steer a team toward the wrong call. Burke's data, as reported, makes that gap concrete: an 80% accuracy win paired with a roughly 60% false-conclusion rate in the same scenarios.
The legitimate critique sits on the surface and should not be buried. This is a vendor study, and the framework is branded by the same firm whose approach it ranks above the alternatives. Burke's release positions the FAR Framework as a method for evaluating synthetic data, not a guarantee. The comparison also favors Burke's preferred category: generative data models grounded in validated human respondent data. The full release, as distributed on PR Newswire, does not yet disclose sample size, scenario count, or how "false conclusion" was defined, and the benchmark itself is not positioned as peer-reviewed. Until those details are public and the study is independently replicated, the 80%-and-still-60%-wrong finding is a Burke result, not an industry consensus.
For a team about to commission synthetic research, the practical move is to change the question they ask vendors. Instead of "how accurate is your panel," ask whether the panel reproduces the conclusions a real customer study would reach on the same decision. Ask for the scenarios tested, the definition of a false conclusion, and whether the benchmark has been run by anyone who does not sell a competing service. The 80% number, on its own, is now demonstrably the wrong thing to optimize.