Arrow Research search

Author name cluster

Qingfeng Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers
2 author rows

Possible papers

4

ICLR Conference 2025 Conference Paper

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

  • Haipeng Luo
  • Qingfeng Sun
  • Can Xu
  • Pu Zhao 0004
  • Jian-Guang Lou
  • Chongyang Tao
  • Xiubo Geng
  • Qingwei Lin

Large language models (LLMs), such as GPT-4, have shown remarkable performance in natural language processing (NLP) tasks, including challenging mathematical reasoning. However, most existing open-source models are only pre-trained on large-scale internet data and without math-related optimization. In this paper, we present WizardMath, which enhances the mathematical reasoning abilities of LLMs, by applying our proposed Reinforcement Learning from Evol-Instruct Feedback (RLEIF) method to the domain of math. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our model. Remarkably, WizardMath-Mistral 7B surpasses all other open-source LLMs by a substantial margin. Furthermore, WizardMath 70B even outperforms ChatGPT-3.5, Claude Instant, Gemini Pro and Mistral Medium. Additionally, our preliminary exploration highlights the pivotal role of instruction evolution and process supervision in achieving exceptional math performance.

NeurIPS Conference 2024 Conference Paper

WizardArena: Post-training Large Language Models via Simulated Offline Chatbot Arena

  • Haipeng Luo
  • Qingfeng Sun
  • Can Xu
  • Pu Zhao
  • Qingwei Lin
  • Jianguang Lou
  • Shifeng Chen
  • Yansong Tang

Recent work demonstrates that, post-training large language models with open-domain instruction following data have achieved colossal success. Simultaneously, human Chatbot Arena has emerged as one of the most reasonable benchmarks for model evaluation and developmental guidance. However, the processes of manually curating high-quality training data and utilizing online human evaluation platforms are both expensive and limited. To mitigate the manual and temporal costs associated with post-training, this paper introduces a Simulated Chatbot Arena named WizardArena, which is fully based on and powered by open-source LLMs. For evaluation scenario, WizardArena can efficiently predict accurate performance rankings among different models based on offline test set. For training scenario, we simulate arena battles among various state-of-the-art models on a large scale of instruction data, subsequently leveraging the battle results to constantly enhance target model in both the supervised fine-tuning and reinforcement learning. Experimental results demonstrate that our WizardArena aligns closely with the online human arena rankings, and our models trained on offline extensive battle data exhibit significant performance improvements during SFT, DPO, and PPO stages.

ICLR Conference 2024 Conference Paper

WizardCoder: Empowering Code Large Language Models with Evol-Instruct

  • Ziyang Luo
  • Can Xu
  • Pu Zhao 0004
  • Qingfeng Sun
  • Xiubo Geng
  • Wenxiang Hu
  • Chongyang Tao
  • Jing Ma 0004

Code Large Language Models (Code LLMs), such as StarCoder, have demonstrated remarkable performance in various code-related tasks. However, different from their counterparts in the general language modeling field, the technique of instruction fine-tuning remains relatively under-researched in this domain. In this paper, we present Code Evol-Instruct, a novel approach that adapts the Evol-Instruct method to the realm of code, enhancing Code LLMs to create novel models, WizardCoder. Through comprehensive experiments on five prominent code generation benchmarks, namely HumanEval, HumanEval+, MBPP, DS-1000, and MultiPL-E, our models showcase outstanding performance. They consistently outperform all other open-source Code LLMs by a significant margin. Remarkably, WizardCoder 15B even surpasses the well-known closed-source LLMs, including Anthropic's Claude and Google's Bard, on the HumanEval and HumanEval+ benchmarks. Additionally, WizardCoder 34B not only achieves a HumanEval score comparable to GPT3.5 (ChatGPT) but also surpasses it on the HumanEval+ benchmark. Furthermore, our preliminary exploration highlights the pivotal role of instruction complexity in achieving exceptional coding performance.

ICLR Conference 2024 Conference Paper

WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions

  • Can Xu
  • Qingfeng Sun
  • Kai Zheng 0021
  • Xiubo Geng
  • Pu Zhao 0004
  • Jiazhan Feng
  • Chongyang Tao
  • Qingwei Lin

Training large language models (LLMs) with open-domain instruction following data brings colossal success. However, manually creating such instruction data is very time-consuming and labor-intensive. Moreover, humans may struggle to produce high-complexity instructions. In this paper, we show an avenue for creating large amounts of instruction data with varying levels of complexity using LLM instead of humans. Starting with an initial set of instructions, we use our proposed Evol-Instruct to rewrite them step by step into more complex instructions. Then, we mix all generated instruction data to fine-tune LLaMA. We call the resulting model WizardLM. Both automatic and human evaluations consistently indicate that WizardLM outperforms baselines such as Alpaca (trained from Self-Instruct) and Vicuna (trained from human-created instructions). The experimental results demonstrate that the quality of instruction-following dataset crafted by Evol-Instruct can significantly improve the performance of LLMs.