On Harmonizing Implicit Subpopulations

Feng Hong 0004; Jiangchao Yao; Yueming Lyu; Zhihan Zhou 0002; Ivor W. Tsang; Ya Zhang 0002; Yanfeng Wang 0001

Back to ICLR

ICLR 2024

On Harmonizing Implicit Subpopulations

Conference Paper Accept (poster) Artificial Intelligence · Machine Learning

Details

Abstract

Machine learning algorithms learned from data with skewed distributions usually suffer from poor generalization, especially when minority classes matter as much as, or even more than majority ones. This is more challenging on class-balanced data that has some hidden imbalanced subpopulations, since prevalent techniques mainly conduct class-level calibration and cannot perform subpopulation-level adjustments without subpopulation annotations. Regarding implicit subpopulation imbalance, we reveal that the key to alleviating the detrimental effect lies in effective subpopulation discovery with proper rebalancing. We then propose a novel subpopulation-imbalanced learning method called Scatter and HarmonizE (SHE). Our method is built upon the guiding principle of optimal data partition, which involves assigning data to subpopulations in a manner that maximizes the predictive information from inputs to labels. With theoretical guarantees and empirical evidences, SHE succeeds in identifying the hidden subpopulations and encourages subpopulation-balanced predictions. Extensive experiments on various benchmark datasets show the effectiveness of SHE.

On Harmonizing Implicit Subpopulations

Abstract

Authors

Keywords

Context