Estimating Model Performance Under Covariate Shift Without Labels

Jakub Białek; Juhani Kivimäki; Wojciech Kuberski; Nikolaos Perrakis

Back to NeurIPS

NeurIPS 2025

Estimating Model Performance Under Covariate Shift Without Labels

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

PDF Details

Abstract

After deployment, machine learning models often experience performance degradation due to shifts in data distribution. It is challenging to assess post-deployment performance accurately when labels are missing or delayed. Existing proxy methods, such as data drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method for evaluating binary classification models on unlabeled tabular data that accurately estimates model performance under covariate shift and call it Probabilistic Adaptive Performance Estimation (PAPE). It can be applied to any performance metric defined with elements of the confusion matrix. Crucially, PAPE operates independently of the original model, relying only on its predictions and probability estimates, and does not need any assumptions about the nature of covariate shift, learning directly from data instead. We tested PAPE using over 900 dataset-model combinations from US census data, assessing its performance against several benchmarks through various metrics. Our findings show that PAPE outperforms other methodologies, making it a superior choice for estimating the performance of binary classification models.

Estimating Model Performance Under Covariate Shift Without Labels

Abstract

Authors

Keywords

Context