AIIM Journal 2026 Journal Article
Calibration-informed metrics for instance-level predictive reliability in medical AI
- Federico Cabitza
Conventional performance metrics in clinical decision support systems, such as accuracy or sensitivity, fail to reflect the reliability of individual predictions-an essential concern for clinicians operating in high-stakes environments. We introduce a calibration-informed framework featuring two novel metrics: the Local Predictive Value (LPV) and the Credible Predictive Value (CPV). LPV estimates the empirical reliability of a prediction by assessing the observed correctness frequency in the neighborhood of its confidence score. CPV refines this estimate using a Bayesian approach, integrating global predictive values as priors to produce a posterior distribution over correctness probabilities. LPV offers a descriptive, data-driven view of local reliability, while CPV provides a belief-adjusted estimate that mitigates overfitting to sparse local data. Applied to benchmark medical imaging datasets, these metrics yielded locally adaptive, interpretable reliability estimates. Divergences between LPV and CPV identified cases where local evidence was insufficient or misleading, highlighting how Bayesian smoothing improves stability against sparse or misleading local evidence. By combining local calibration with Bayesian inference, LPV and CPV advance the development of medical AI systems that are not only accurate but also interpretable and trustworthy at the individual case level.