When Mickael Tordjman showed radiologists a set of chest X-rays, some were real and some were not. The radiologists could not tell them apart.
Tordjman, a postdoctoral fellow at the Icahn School of Medicine at Mount Sinai in New York, is the lead author of a study published March 24, 2026 in Radiology, the journal of the Radiological Society of North America (RSNA). The number that made headlines was 75 percent: the accuracy score trained radiologists achieved after being told the dataset contained fakes. But the finding that matters for patient safety is the one that came first. When radiologists had no idea synthetic images were in the mix, only 7 of 17 flagged them spontaneously — a 41 percent detection rate.
That gap between 41 percent and 75 percent is not a story about education. It is a clinical safety signal. In practice, a fabricated X-ray will not arrive in a chart with a note that says "by the way, some of these are fake."
The study included 17 practicing radiologists from six countries, with professional experience ranging from zero to 40 years. They reviewed 264 chest X-rays split evenly between authentic scans and AI-generated images. Two separate image sets were used. The first paired real radiographs with synthetic ones made by GPT-4o. The second used RoentGen, an open-source chest X-ray diffusion model developed at Stanford Medicine by researchers led by Curtis Langlotz and Akshay Chaudhari. After being told the dataset contained fakes, radiologists scored 75 percent accuracy on the GPT-4o images and between 62 and 78 percent on the RoentGen set. The difference was not statistically significant.
More striking: experience did not help. A radiologist with two decades of practice performed no better on average than one fresh out of residency. When Tordjman's team looked at the data, there was no correlation between years in the field and accuracy at detecting synthetic images. "Deepfake medical images often look too perfect," Tordjman said in an RSNA statement. "Bones are overly smooth, spines unnaturally straight, lungs overly symmetrical, blood vessel patterns excessively uniform, and fractures appear unusually clean and consistent." The problem is that "too perfect" is not a red flag in a clinical setting. It is what good technique looks like.
The implications extend well beyond the radiology reading room. Tordjman laid out two concrete threat scenarios. The first is fraudulent litigation: a fabricated fracture X-ray inserted into a medical record to support a personal injury claim. The second is cybersecurity. If an attacker gains access to a hospital network, synthetic images could be injected into patient records to manipulate diagnoses or undermine the reliability of the digital medical record at scale.
This matters for AI development too. Medical AI models are trained on large image datasets. If synthetic images pollute those training sets, a model learns from data that does not reflect real pathology. That is data poisoning, and it works in both directions: bad data can degrade a model's performance, but high-quality synthetic data can also make a model overconfident in patterns that do not exist in authentic scans.
The technology already has specific failure modes worth noting. GPT-4o failed to generate appropriate radiographs for pleural effusions and vertebral fractures — conditions where the visual signature is complex and context-dependent. That failure is reassuring in the near term: an attacker using today's best tools would leave detectable traces in those specific image types. But it is also a narrow margin.
Four multimodal large language models took the same test. GPT-4o, the model used to generate the synthetic images, correctly identified its own fakes 85 percent of the time. GPT-5 managed 83 percent. Gemini 2.5 Pro from Google reached 56 percent and Llama 4 Maverick from Meta came in at 59 percent. On the RoentGen chest X-rays, AI model accuracy ranged from 52 percent to 89 percent. None caught everything. The model that made the fakes was also the best at spotting them, and it still missed 15 percent.
The burden of dealing with this has already landed on institutions. Hospitals and imaging centers need to operate as though synthetic medical images are circulating, because the tools to make them are free and the barriers to entry are essentially zero. "AI has lowered the cost of fabricating medical truth to nearly nothing," wrote Rajesh Bhayana and Satheesh Krishna, both radiologists at the University of Toronto, in an accompanying editorial in Radiology. "During the past decade, we have focused on the potential of AI to help us see better. We must now grapple with its potential to make us see things that simply do not exist."
The fixes exist as concepts and early implementations. Researchers recommend invisible digital watermarks embedded directly into images at the point of capture, and cryptographic signatures tied to the technologist who acquired the scan. What the study shows is that the problem is urgent enough to stop waiting. The question is not whether this technology will mature. It has. The question is whether the infrastructure to verify authentic clinical images will arrive before the threats move from theoretical to operational.