On Coresets for End-to-end Learning from Crowds

Hang Yang; Zhiwu Li; Witold Pedrycz

doi:10.1609/aaai.v40i21.38850

Back to AAAI

AAAI 2026

On Coresets for End-to-end Learning from Crowds

Conference Paper AAAI Technical Track on Humans and AI Artificial Intelligence

PDF Details DOI

Abstract

Crowdsourcing is a common approach for training data-hungry models by collecting high-quality labeled data with human labor. With crowdsourcing data, the end-to-end learning paradigm is rising, where the classifier is concatenated with annotator-specific confusion layers and the two parts are co-trained in a parameter-coupled manner. However, learning with the size of a very large set of annotations is a challenge when computation or energy is limited. In this paper, we analyze and refine the coresets for end-to-end learning from crowds under the sensitivity sampling framework. This coreset is a small possible subset of annotations, so one can efficiently optimize the Coupled Cross-Entropy Minimization problem with guaranteed approximation. We first prove the lower bound, which shows no coresets smaller than complete data with confusion layers. Then, with workers' transition matrices, we show that with the regularization term, this lower bound can be prevented. Our main result is that under mild assumptions, a smaller coreset exists for the regularized Coupled Cross-Entropy Minimization problem. An upper bound of sensitivity is proposed for designing a sampling algorithm called CrowdCore. The experimental results on synthetic and real-world datasets demonstrate the effectiveness of our analysis.

On Coresets for End-to-end Learning from Crowds

Abstract

Authors

Keywords

Context