AAAI Conference 2025 Conference Paper
Bagging-Expert Network for Multi-Task Learning: A Depolarization Solution in Multi-Gate Mixture-of-Experts
- Gong-Duo Zhang
- Ruiqing Chen
- Qian Zhao
- Zhengwei Wu
- Fengyu Han
- Huan-Yi Su
- Ziqi Liu
- Lihong Gu
Multi-task learning (MTL) is widely utilized across a variety of real-world applications, including recommendation systems. For instance, in the field of e-commerce, MTL is commonly employed to simultaneously model click, conversion, and user dwelling time. Among a various of MTL models, the Multi-gate Mixture-of-Experts (MMoE) has gained significant popularity. However, MMoE suffers from the polarization issue during training, where the weights of certain experts tend to converge towards 0. To address this issue, we propose a novel method called Bagging-Expert network (BEnet) for multi-task learning. BEnet effectively mitigates the problem of polarization and achieves excellent performance in multi-task learning. It incorporates a bagging layer and an attention mechanism to encourage experts focusing on diverse knowledge domains. Simultaneously, polarization is avoided as different experts execute respective duties and specialize in distinct domains. Experimental results on real-world datasets demonstrate that BEnet has strong robustness and outperforms other state-of-the-art (SOTA) MTL methods.