The AI Audit Tool Every Company Uses Is Broken
Every time a bank explains a loan denial via SHAP scores, or a hospital surfaces which features drove a diagnostic prediction, or a regulator asks for proof an algorithm isn't discriminatory, the machinery underneath is likely broken.
A paper published to arXiv in April 2026 proves that SHAP, the dominant method for explaining machine learning model predictions, produces mathematically unreliable results whenever the input features are correlated. The paper calls this the Attribution Impossibility: no feature ranking can simultaneously be faithful to the model, stable across retraining, and complete. Under collinearity, which the authors document in 68% of 77 real-world public datasets they surveyed, the ranking flips up to 50% of the time — a coin flip.
The result has consequences that extend well beyond the benchmark. SHAP-based explanations are not just a transparency feature. They are the evidentiary backbone of a compliance infrastructure that now touches credit, hiring, healthcare, and insurance. If those rankings are flipping with each retraining, the audit is noise.
The math, briefly: when two features are correlated, the model has no principled basis for choosing between them at split time. A different random seed produces a different split, and therefore a different attribution ranking. The paper quantifies the divergence precisely: for gradient boosting, the attribution ratio grows as 1/(1−ρ²) as correlation ρ approaches 1, meaning the instability becomes infinite. For Lasso it is already infinite. Random forests are more stable but still not immune.
What makes this result different from prior critiques of SHAP is the formal verification. The authors prove their theorems in Lean 4, a mechanical proof assistant, across 305 theorems from 16 axioms with zero sorry — a level of rigor that rules out the hand-wavy proof gaps that plague most interpretability literature. A separate impossibility theorem published in PNAS in 2024, by Bilodeau, Jaques, Koh, and Been Kim, reaches similar conclusions via a different proof path, giving the finding independent corroboration.
The paper does not leave practitioners empty-handed. It proposes DASH — Diversified Aggregation of SHAP — an ensemble method that trains multiple models and averages their attribution scores, reporting ties for symmetric features rather than forcing a ranking that the data cannot support. DASH is provably Pareto-optimal among unbiased aggregation methods and achieves the Cramer-Rao variance bound, meaning it is information-theoretically optimal. The authors provide a diagnostic workflow: a Z-test to detect unstable pairs and a single-model screening tool with 94% precision on their test dataset. The ensemble size formula tells you exactly how many models you need.
None of this is in production. The DASH implementation exists in the authors' GitHub repository but has not been merged into the mainstream SHAP library. The computational cost is not trivial: the authors recommend at least 25 model retrainings to suppress the flip rate below 1%. For organizations running large-scale gradient-boosted models, that is a meaningful operational change.
The fairness audit angle is where this stops being a theoretical problem. The paper states it plainly: SHAP-based proxy discrimination audits are provably unreliable under collinearity — the audit conclusion is a coin flip across training seeds. Under the EU AI Act, high-risk AI systems must disclose known and foreseeable circumstances that affect reliability. The authors argue this theorem qualifies, and they provide ready-to-use instability disclosure language for model documentation. The August 2026 compliance deadline for high-risk systems under the Act is seven months away. Any organization that has filed SHAP-based audit evidence as part of a conformity assessment may be sitting on a liability.
The limitation worth knowing: this is a paper from independent researchers, not a peer-reviewed lab. The 68% instability figure is a conservative lower bound with 32% survey power. There is no independent replication yet. The SHAP library maintainers have not publicly responded to the paper's claims. Those are real gaps, and they are why the evidence upgrade plan for this story includes reaching Scott Lundberg, the original author of the SHAP library, and an ML fairness practitioner who has encountered this problem in production.
But the burden of proof here runs in the other direction. The authors have produced a mechanically verified impossibility theorem. The burden is on the field to explain why that doesn't matter.