The AI diagnostic revolution has a data problem. The FDA is not helping.
When a machine learning model learns to diagnose disease, it learns from patients it has seen before. The uncomfortable question emerging in medical AI is: which patients were those?
Today, STAT News published the most systematic accounting of that gap yet. A review of FDA 510(k) and De Novo filings for AI diagnostic devices found the vast majority disclose nothing about the demographic composition of their training data — and the FDA has not required them to. The same day, OpenAI published a policy blueprint for expanding AI in healthcare settings that does not mention training data transparency. Together, the two developments frame a problem that patient safety researchers have been documenting for years: AI diagnostic tools are deployed across American medicine on data that skews in documented ways, and nobody is required to say whose patients taught these systems to diagnose.
The timing of the STAT review is not incidental. AI diagnostic tools have moved from radiology suites into routine clinical use, and the gap between what these systems were trained on and who they are deployed to serve has become a documented patient safety concern. ECRI, a nonprofit that tracks medical device and diagnostic safety, placed AI diagnostic risks at the top of its 2026 patient safety concerns list, citing published evidence that some machine learning models failed to recognize 66% of critical or deteriorating health conditions in simulated cases. The organization has noted that AI tools are only as reliable as the data used to train them — and that data is often poorly characterized.
The performance disparities are documented in peer-reviewed literature. A systematic review in Frontiers in Medicine found that underrepresentation of rural populations in training datasets was linked to a 23% higher false-negative rate for pneumonia detection — meaning the algorithm missed more real cases in the populations it had learned least about. Melanoma detection errors are more prevalent among dark-skinned patients, the same review found, because training datasets contained too few examples from those populations for the model to learn the relevant patterns. The model was not wrong about melanoma. It was wrong about a subset of melanoma patients — the ones it had seen least.
"This is not a hypothetical risk," said Divvy Upadhyay, a diagnostic safety researcher at ECRI, during a March webinar on the report. "The models perform differently across populations, and right now we do not have a systematic way to know which populations a given device was validated on."
The FDA gap
The FDA clears AI diagnostic devices through the 510(k) and De Novo pathways, requiring manufacturers to demonstrate "substantial equivalence" to existing devices or, in the De Novo route, to show reasonable assurance of safety for a novel device type. What neither pathway currently requires is disclosure of the demographic composition of the training data underlying the device. The STAT News review found the vast majority disclose nothing about whether the data used to train the algorithm reflects the patient populations likely to encounter the device in clinical use.
Companies selling AI diagnostic devices have not been required to provide demographic breakdowns, and most have not volunteered them — even as many have made public commitments to equity and representativeness in healthcare AI. The disclosure gap may reflect FDA not asking rather than companies refusing to provide it. The agency has not yet issued clear guidelines for demographic data disclosure in AI device training sets.
"The companies are not necessarily hiding anything," said one health policy researcher who studies FDA device review, speaking on background because their institution engages with FDA on related policy matters. "They may be operating exactly within the requirements that exist. The problem is that the requirements have not kept pace with the technology."
OpenAI's blind spot
OpenAI has launched ChatGPT Health for consumers, ChatGPT for Healthcare for hospitals, and a clinician-focused product — alongside a policy blueprint the company calls "Keeping Patients First". The document, which the company described as a blueprint for unlocking AI's potential in healthcare, drew a mixed reception from health policy experts. David Blumenthal, a former national coordinator for health IT and a professor at Harvard, told STAT that the company was "trying to have their cake and eat it too" — sounding like a responsible party while lobbying for markets to stay open for its products.
The policy blueprint recommends expanded AI use in clinical settings without specifying training data transparency requirements that would allow a hospital to evaluate whether a given tool was validated on populations resembling its own patients.
What the data problem looks like in practice
The training data problem is not uniform. Some FDA-cleared AI radiology tools were trained on millions of images from large integrated health systems and have been validated on more diverse populations than most academic medical centers routinely see. Others were trained on small retrospective datasets from a single institution and deployed broadly before independent validation in other settings.
The stakes are concrete: a model trained on X-rays from patients at a single academic medical center may have learned patterns specific to that institution's scanners, patient demographics, and disease prevalence. Those patterns may not transfer to a rural hospital using different scanners on a different patient population. The FDA clearance process reviews whether a device is safe and effective for its intended use — it does not currently require a training data demographic report card.
A hospital buying an AI diagnostic tool often cannot answer a basic question: does this tool work equally well on my patients?
Upadhyay's recommendation for hospitals is straightforward: treat AI tools as one input among many, not a final word. "Clinicians need to be empowered to question these tools, to understand their limitations, and to override them when clinical judgment says otherwise," he said. That means institutions need their own AI governance policies — something many have not yet built.
ECRI's broader recommendation is that hospitals establish explicit AI usage guidelines, disclose AI involvement to patients, and create institutional cultures where staff can flag problems without fear. The organization notes that AI systems are only as good as the algorithms they use and the data on which they are trained — and the potential for errors remains a significant concern.
Whether the FDA moves to require demographic training data disclosure, or whether companies begin volunteering it, is an open policy question. For now, the most widely deployed AI diagnostic tools in American medicine were trained on data that skews in ways nobody has been required to quantify.