Arrow Research search

Author name cluster

Chengyi Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers
1 author row

Possible papers

3

NeurIPS Conference 2025 Conference Paper

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

  • Qiying Yu
  • Zheng Zhang
  • Ruofei Zhu
  • Yufeng Yuan
  • Xiaochen Zuo
  • Yu Yue
  • Weinan Dai
  • Tiantian Fan

Inference scaling empowers LLMs with unprecedented reasoning ability, with reinforcement learning as the core technique to elicit complex reasoning. However, key technical details of state-of-the-art reasoning LLMs are concealed (such as in OpenAI o1 blog and DeepSeek R1 technical report), thus the community still struggles to reproduce their RL training results. We propose the D ecoupled Clip and D ynamic s A mpling P olicy O ptimization ( DAPO ) algorithm, and fully open-source a state-of-the-art large-scale RL system that achieves 50 points on AIME 2024 using Qwen2. 5-32B base model. Unlike previous works that withhold training details, we introduce four key techniques of our algorithm that make large-scale LLM RL a success. In addition, we open-source our training code, which is built on the verl framework, along with a carefully curated and processed dataset. These components of our open-source system enhance reproducibility and support future research in large-scale LLM RL.

AAAI Conference 2025 Conference Paper

Empowering Self-Learning of LLMs: Inner Knowledge Explicitation as a Catalyst

  • Shijue Huang
  • Wanjun Zhong
  • Deng Cai
  • Fanqi Wan
  • Chengyi Wang
  • Mingxuan Wang
  • Mu Qiao
  • Ruifeng Xu

Self-learning of Large Language Models (LLMs) facilitates their advancement towards super-intelligence by training with self-synthesized experiences. However, a critical challenge is the amplification of hallucinations in generated data during iterative self-learning, underscoring the need for reliable data selection. To address this, we investigate the mechanism of Inner Knowledge Explicitation, which involves explicitly extracting the inner knowledge from memory of LLMs, to concurrently improves reasoning, and enables reliable self-learning data selection. This paper introduces a Self Knowledge Explicitation Learning (SKE-Learn) framework, which equips the LLMs with meta-skills to explicitly extract, verify and utilize inner knowledge for reasoning. By leveraging these meta-skills, SKE-Learn establishes a self-learning approach that ensures reliable selection of self-synthetic data. This approach enhances performance through iterative self-learning while mitigating the problem of hallucinations. Empirical results from six benchmarks demonstrate that Inner Knowledge Explicitation improves reasoning by serving as a more effective prompting method. Additionally, SKE-Learn, based on the verifiability of explicit knowledge, shows consistent performance improvements over multiple self-training iterations, with an average performance increase from 52.79% to 56.54% across all benchmarks. Furthermore, Inner Knowledge Explicitation provides explanation and intervention space during LLM's generation process.

AAAI Conference 2020 Conference Paper

Bridging the Gap between Pre-Training and Fine-Tuning for End-to-End Speech Translation

  • Chengyi Wang
  • Yu Wu
  • Shujie Liu
  • Zhenglu Yang
  • Ming Zhou

End-to-end speech translation, a hot topic in recent years, aims to translate a segment of audio into a specific language with an end-to-end model. Conventional approaches employ multi-task learning and pre-training methods for this task, but they suffer from the huge gap between pre-training and fine-tuning. To address these issues, we propose a Tandem Connectionist Encoding Network (TCEN) which bridges the gap by reusing all subnets in fine-tuning, keeping the roles of subnets consistent, and pre-training the attention module. Furthermore, we propose two simple but effective methods to guarantee the speech encoder outputs and the MT encoder inputs are consistent in terms of semantic representation and sequence length. Experimental results show that our model leads to significant improvements in En-De and En-Fr translation irrespective of the backbones.