Why a doctor's decision not to order a test is itself medical data
AI systems trained on medical records should learn from the gaps in a patient's chart as much as from the lab values themselves.
AI systems trained on medical records should learn from the gaps in a patient's chart as much as from the lab values themselves.
When a doctor decides not to order a cardiac enzyme test on a low-risk chest-pain patient, that decision is not an empty data point. It is information the clinician already encoded, drawn from experience, triage judgment, and the patient's history. A new preprint argues that clinical AI should treat the gaps in a patient's record with the same weight as the lab values themselves, and that doing so could change how the next generation of medical AI learns from routine care.
The paper, Informative Missingness to Generate Irregular Clinical Time Series, submitted to arXiv on 14 June 2026 by a team spanning the University of Pavia, Polytechnic of Milan, and Aarhus University, makes a specific claim: in electronic health record (EHR) data, the absence of a test order can be as informative as the measurement itself. That is because missingness in clinical data is rarely random. A skipped renal panel on a stable admission, a held-off blood culture on an afebrile patient, an omitted troponin on someone whose EKG already looks reassuring: each gap reflects a decision the clinician made, not a hole in the data pipeline.
The statistical term for this is "informative missingness," and it has been a quiet irritant for clinical AI researchers for years. Standard practice treats missing values as a preprocessing problem to be solved, often with imputation: guess what the missing lab value would have been, fill it in, and move on. The authors argue this discards signal. The decision to skip a test is itself a learned behavior, and a model that ignores it is leaving information on the table.
To capture that signal, the team built a diffusion-based generative model. A diffusion model is a class of AI that learns to produce new samples by iteratively refining noise into structured output. Their version jointly models laboratory values and their observation patterns. Instead of imputing values after the fact, the model learns the joint distribution of "what was measured" and "what was not," generating synthetic patient timelines that respect realistic clinical decision-making.
The team evaluated their approach on a single benchmark: the Data Analytics Challenge on Missing Data Imputation (DACMI), derived from MIMIC-III, a widely used intensive care unit (ICU) dataset from Beth Israel Deaconess Medical Center. They aligned chart times into 4-hour intervals and segmented admissions into 7-day windows, producing trajectory-level outputs meant to mirror the irregular sampling clinicians actually produce. The authors describe their results as preliminary, and the contribution is best read as a building block rather than a deployed system.
That framing matters. MIMIC-III is an older ICU-skewed cohort, and synthetic clinical data trained on a single population can carry that population's blind spots, including its demographic mix, its practice patterns, and the kinds of patients it tends to miss. The paper does not claim external validation, and the joint embedding of values and missingness has not yet been tested across hospitals, EHR systems, or outpatient settings. Treating this as a finished clinical foundation model would overstate what the authors have shown.
What the paper does offer is a constructive reframe. Most coverage of clinical AI focuses on what a model produces: a diagnosis, a risk score, an imputed lab value. The preprint inverts that frame and asks what the model can learn from what clinicians chose not to do. If that direction holds up under broader evaluation, it would push the field toward AI systems that learn from clinical decision-making itself: the choices, hesitations, and shortcuts that already fill every patient chart.
The work also raises a quieter question for the people who build these systems: how much of the signal in medical records has been thrown away by pipelines that treat absence as noise? The authors' answer, for now, is a working prototype and a benchmark result. The next step is independent replication on cohorts that look less like the ICU and more like the rest of medicine.