Yi Jing Papers

AAAI Conference 2026 Conference Paper

MRACL: Multi-Reward Space Guided Adaptive Curriculum Reinforcement Learning for LLMs

Wenxuan Liu
Liangyu Huo
Yi Jing
Xiyuan Zhang
Jian Xie

Reinforcement learning (RL) has recently become a powerful yet resource-intensive approach for post-training large language models (LLMs). Incorporating curriculum learning (CL) into RL has been shown to significantly improve training efficiency, particularly in reasoning tasks. However, existing CL methods face substantial challenges in multi-objective RL (MORL) settings, including: (1) difficulty in evaluating model capabilities online, (2) challenges in assessing sample importance under diverse objectives, and (3) inherent trade-offs between online training and offline inference in dynamically designing the curriculum. To address these issues, we propose a Multi-Reward space guided Adaptive Curriculum Learning framework (MRACL), which is the first to incorporate curriculum learning into multi-objective RL. MRACL first constructs a multi-dimensional reward space via offline inference to establish initial reward profiles for each training sample. During training, based on reward space, it estimates the evolving model capabilities by computing the centroid of the space and calculates the sample priority score through its capability distance, optimization direction, and historical evolution, which enables adaptive selection of the most informative training samples at each step, independent of the specific RL algorithm. After each RL training iteration, the reward space is dynamically updated to reflect the model's evolving capabilities and the shifting distribution of sample priorities. Experiments on multi-objective alignment tasks demonstrate that MRACL achieves 1.62× faster convergence compared to state-of-the-art curriculum methods and 2.55× faster than non-curriculum methods. Furthermore, it consistently outperforms all baselines in both win rate and rule-based evaluation. We further provide an in-depth analysis of the key factors contributing to \modelname's effectiveness, along with its advantages, scenarios, and generalization across diverse settings.

PDF Details DOI

ICML Conference 2022 Conference Paper

Learning Multiscale Transformer Models for Sequence Generation

Bei Li
Tong Zheng
Yi Jing
Chengbo Jiao
Tong Xiao 0001
JingBo Zhu

Multiscale feature hierarchies have been witnessed the success in the computer vision area. This further motivates researchers to design multiscale Transformer for natural language processing, mostly based on the self-attention mechanism. For example, restricting the receptive field across heads or extracting local fine-grained features via convolutions. However, most of existing works directly modeled local features but ignored the word-boundary information. This results in redundant and ambiguous attention distributions, which lacks of interpretability. In this work, we define those scales in different linguistic units, including sub-words, words and phrases. We built a multiscale Transformer model by establishing relationships among scales based on word-boundary information and phrase-level prior knowledge. The proposed \textbf{U}niversal \textbf{M}ulti\textbf{S}cale \textbf{T}ransformer, namely \textsc{Umst}, was evaluated on two sequence generation tasks. Notably, it yielded consistent performance gains over the strong baseline on several test sets without sacrificing the efficiency.

Details

Possible papers

MRACL: Multi-Reward Space Guided Adaptive Curriculum Reinforcement Learning for LLMs

Learning Multiscale Transformer Models for Sequence Generation