Arrow Research search
Back to NeurIPS

NeurIPS 2025

Estimating Model Performance Under Covariate Shift Without Labels

Conference Paper Main Conference Track Artificial Intelligence · Machine Learning

Abstract

After deployment, machine learning models often experience performance degradation due to shifts in data distribution. It is challenging to assess post-deployment performance accurately when labels are missing or delayed. Existing proxy methods, such as data drift detection, fail to measure the effects of these shifts adequately. To address this, we introduce a new method for evaluating binary classification models on unlabeled tabular data that accurately estimates model performance under covariate shift and call it Probabilistic Adaptive Performance Estimation (PAPE). It can be applied to any performance metric defined with elements of the confusion matrix. Crucially, PAPE operates independently of the original model, relying only on its predictions and probability estimates, and does not need any assumptions about the nature of covariate shift, learning directly from data instead. We tested PAPE using over 900 dataset-model combinations from US census data, assessing its performance against several benchmarks through various metrics. Our findings show that PAPE outperforms other methodologies, making it a superior choice for estimating the performance of binary classification models.

Authors

Keywords

No keywords are indexed for this paper.

Context

Venue
Annual Conference on Neural Information Processing Systems
Archive span
1987-2025
Indexed papers
30776
Paper id
675365591113063797