Author name cluster

Chen Xing

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

ICLR Conference 2025 Conference Paper

ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement

Xiangyu Peng
Congying Xia
Xinyi Yang 0002
Caiming Xiong
Chien-Sheng Wu
Chen Xing

Post-training Large Language Models (LLMs) with explicit reasoning trajectories can enhance their reasoning abilities. However, acquiring such high-quality trajectory data typically demands meticulous supervision from humans or superior models, which can be either expensive or license-constrained. In this paper, we explore how far an LLM can improve its reasoning by self-synthesizing reasoning paths as training data without any additional supervision. Existing self-synthesizing methods, such as STaR, suffer from poor generalization to out-of-domain (OOD) reasoning tasks. We hypothesize it is due to that their self-synthesized reasoning paths are too task-specific, lacking general task-agnostic reasoning guidance. To address this, we propose **Reasoning Generalist via Self-Improvement (ReGenesis)**, a method to *self-synthesize reasoning paths as post-training data by progressing from abstract to concrete*. More specifically, ReGenesis self-synthesizes reasoning paths by converting general reasoning guidelines into task-specific ones, generating reasoning structures, and subsequently transforming these structures into reasoning paths, without the need for human-designed task-specific examples used in existing methods. We show that ReGenesis achieves superior performance on all in-domain and OOD settings tested compared to existing methods. For six OOD tasks specifically, while previous methods exhibited an average performance decrease of approximately 4.6% after post training, ReGenesis delivers around 6.1% performance improvement. We also conduct an in-depth analysis of our framework and show ReGenesis is effective across various language models and design choices.

Details

ICLR Conference 2024 Conference Paper

Lemur: Harmonizing Natural Language and Code for Language Agents

Yiheng Xu
Hongjin Su
Chen Xing
Boyu Mi
Qian Liu 0033
Weijia Shi
Binyuan Hui
Fan Zhou

We introduce Lemur and Lemur-Chat, openly accessible language models optimized for both natural language and coding capabilities to serve as the backbone of versatile language agents. The evolution from language chat models to functional language agents demands that models not only master human interaction, reasoning, and planning but also ensure grounding in the relevant environments. This calls for a harmonious blend of language and coding capabilities in the models. Lemur and Lemur-Chat are proposed to address this necessity, demonstrating balanced proficiencies in both domains, unlike existing open-source models that tend to specialize in either. Through meticulous pretraining using a code-intensive corpus and instruction fine-tuning on text and code data, our models achieve state-of-the-art averaged performance across diverse text and coding benchmarks. Comprehensive experiments demonstrate Lemur’s superiority over existing open-source models and its proficiency across various agent tasks involving human communication, tool usage, and interaction under fully- and partially- observable environments. The harmonization between natural and programming languages enables Lemur-Chat to significantly narrow the gap with proprietary models on agent abilities, providing key insights into developing advanced open-source agents adept at reasoning, planning, and operating seamlessly across environments. Our model and code have been open-sourced at https://github.com/OpenLemur/Lemur.

Details

YNIMG Journal 2023 Journal Article

Increased interbrain synchronization and neural efficiency of the frontal cortex to enhance human coordinative behavior: A combined hyper-tES and fNIRS study

Hongliang Lu
Xinlu Wang
Yajuan Zhang
Peng Huang
Chen Xing
Mingming Zhang
Xia Zhu

Coordination is crucial for individuals to achieve common goals; however, the causal relationship between coordination behavior and neural activity has not yet been explored. Interbrain synchronization (IBS) and neural efficiency in cortical areas associated with the mirror neuron system (MNS) are considered two potential brain mechanisms. In the present study, we attempted to clarify how the two mechanisms facilitate coordination using hypertranscranial electrical stimulation (hyper-tES). A total of 124 healthy young adults were randomly divided into three groups (the hyper-tACS, hyper-tDCS and sham groups) and underwent modulation of the right inferior frontal gyrus (IFG) during functional near-infrared spectroscopy (fNIRS). Increased IBS of the PFC or neural efficiency of the right IFG (related to the MNS) was accompanied by greater coordination behavior; IBS had longer-lasting effects on behavior. Our findings highlight the importance of IBS and neural efficiency of the frontal cortex for coordination and suggest potential interventions to improve coordination in different temporal windows.

Details DOI

ICLR Conference 2023 Conference Paper

Model ensemble instead of prompt fusion: a sample-specific knowledge transfer method for few-shot prompt tuning

Xiangyu Peng
Chen Xing
Prafulla Kumar Choubey
Chien-Sheng Wu
Caiming Xiong

Prompt tuning approaches, which learn task-specific soft prompts for a downstream task conditioning on frozen pre-trained models, have attracted growing interest due to its parameter efficiency. With large language models and sufficient training data, prompt tuning performs comparably to full-model tuning. However, with limited training samples in few-shot settings, prompt tuning fails to match the performance of full-model fine-tuning. In this work, we focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks with abundant training samples. Recognizing the good generalization capabilities of ensemble methods in low-data regime, we first experiment and show that a simple ensemble of model predictions based on different source prompts, outperforms existing multi-prompt knowledge transfer approaches such as source prompt fusion in the few-shot setting. Motivated by this observation, we further investigate model ensembles and propose Sample-specific Ensemble of Source Models (SESoM). SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs. Through this way, SESoM inherits the superior generalization of ensemble methods and simultaneously captures the sample-specific competence of each source prompt. We conduct experiments across a diverse set of eight NLP tasks using models of different scales (T5-\{base, large, XL\}) and find that SESoM consistently outperforms the existing models of the same as well as larger parametric scale by a large margin.

Details

ICLR Conference 2021 Conference Paper

Taking Notes on the Fly Helps Language Pre-Training

Qiyu Wu 0001
Chen Xing
Yatao Li
Guolin Ke
Di He 0001
Tie-Yan Liu

How to make unsupervised language pre-training more efficient and less resource-intensive is an important research direction in NLP. In this paper, we focus on improving the efficiency of language pre-training methods through providing better data utilization. It is well-known that in language data corpus, words follow a heavy-tail distribution. A large proportion of words appear only very few times and the embeddings of rare words are usually poorly optimized. We argue that such embeddings carry inadequate semantic signals, which could make the data utilization inefficient and slow down the pre-training of the entire model. To mitigate this problem, we propose Taking Notes on the Fly (TNF), which takes notes for rare words on the fly during pre-training to help the model understand them when they occur next time. Specifically, TNF maintains a note dictionary and saves a rare word's contextual information in it as notes when the rare word occurs in a sentence. When the same rare word occurs again during training, the note information saved beforehand can be employed to enhance the semantics of the current sentence. By doing so, TNF provides a better data utilization since cross-sentence information is employed to cover the inadequate semantics caused by rare words in the sentences. We implement TNF on both BERT and ELECTRA to check its efficiency and effectiveness. Experimental results show that TNF's training time is 60% less than its backbone pre-training models when reaching the same performance. When trained with same number of iterations, TNF outperforms its backbone methods on most of downstream tasks and the average GLUE score. Code is attached in the supplementary material.

Details

ICLR Conference 2020 Conference Paper

Distance-Based Learning from Errors for Confidence Calibration

Chen Xing
Sercan Ömer Arik
Zizhao Zhang
Tomas Pfister

Deep neural networks (DNNs) are poorly calibrated when trained in conventional ways. To improve confidence calibration of DNNs, we propose a novel training method, distance-based learning from errors (DBLE). DBLE bases its confidence estimation on distances in the representation space. In DBLE, we first adapt prototypical learning to train classification models. It yields a representation space where the distance between a test sample and its ground truth class center can calibrate the model's classification performance. At inference, however, these distances are not available due to the lack of ground truth labels. To circumvent this by inferring the distance for every test sample, we propose to train a confidence model jointly with the classification model. We integrate this into training by merely learning from mis-classified training samples, which we show to be highly beneficial for effective learning. On multiple datasets and DNN architectures, we demonstrate that DBLE outperforms alternative single-model confidence calibration approaches. DBLE also achieves comparable performance with computationally-expensive ensemble approaches with lower computational cost and lower number of parameters.

Details

ICML Conference 2020 Conference Paper

On Layer Normalization in the Transformer Architecture

Ruibin Xiong
Yunchang Yang
Di He 0001
Kai Zheng 0007
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan

The Transformer is widely used in natural language processing tasks. To train a Transformer however, one usually needs a carefully designed learning rate warm-up stage, which is shown to be crucial to the final performance but will slow down the optimization and bring more hyper-parameter tunings. In this paper, we first study theoretically why the learning rate warm-up stage is essential and show that the location of layer normalization matters. Specifically, we prove with mean field theory that at initialization, for the original-designed Post-LN Transformer, which places the layer normalization between the residual blocks, the expected gradients of the parameters near the output layer are large. Therefore, using a large learning rate on those gradients makes the training unstable. The warm-up stage is practically helpful for avoiding this problem. On the other hand, our theory also shows that if the layer normalization is put inside the residual blocks (recently proposed as Pre-LN Transformer), the gradients are well-behaved at initialization. This motivates us to remove the warm-up stage for the training of Pre-LN Transformers. We show in our experiments that Pre-LN Transformers without the warm-up stage can reach comparable results with baselines while requiring significantly less training time and hyper-parameter tuning on a wide range of applications.

Details

NeurIPS Conference 2019 Conference Paper

Adaptive Cross-Modal Few-shot Learning

Chen Xing
Negar Rostamzadeh
Boris Oreshkin
Pedro O. Pinheiro

Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. In this paper, we propose to leverage cross-modal information to enhance metric-based few-shot learning methods. Visual and semantic feature spaces have different structures by definition. For certain concepts, visual features might be richer and more discriminative than text ones. While for others, the inverse might be true. Moreover, when the support from visual information is limited in image classification, semantic representations (learned from unsupervised text corpora) can provide strong prior knowledge and context to help learning. Based on these two intuitions, we propose a mechanism that can adaptively combine information from both modalities according to new image categories to be learned. Through a series of experiments, we show that by this adaptive combination of the two modalities, our model outperforms current uni-modality few-shot learning methods and modality-alignment methods by a large margin on all benchmarks and few-shot scenarios tested. Experiments also show that our model can effectively adjust its focus on the two modalities. The improvement in performance is particularly large when the number of shots is very small.

PDF Details

AAAI Conference 2018 Conference Paper

Hierarchical Recurrent Attention Network for Response Generation

Chen Xing
Yu Wu
Wei Wu
Yalou Huang
Ming Zhou

We study multi-turn response generation in chatbots where a response is generated according to a conversation context. Existing work has modeled the hierarchy of the context, but does not pay enough attention to the fact that words and utterances in the context are differentially important. As a result, they may lose important information in context and generate irrelevant responses. We propose a hierarchical recurrent attention network (HRAN) to model both the hierarchy and the importance variance in a uniﬁed framework. In HRAN, a hierarchical attention mechanism attends to important parts within and among utterances with word level attention and utterance level attention respectively. Empirical studies on both automatic evaluation and human judgment show that HRAN can signiﬁcantly outperform state-of-the-art models for context based response generation.

PDF Details

AAAI Conference 2017 Conference Paper

Topic Aware Neural Response Generation

Chen Xing
Wei Wu
Yu Wu
Jie Liu
Yalou Huang
Ming Zhou
Wei-Ying Ma

We consider incorporating topic information into a sequenceto-sequence framework to generate informative and interesting responses for chatbots. To this end, we propose a topic aware sequence-to-sequence (TA-Seq2Seq) model. The model utilizes topics to simulate prior human knowledge that guides them to form informative and interesting responses in conversation, and leverages topic information in generation by a joint attention mechanism and a biased generation probability. The joint attention mechanism summarizes the hidden vectors of an input message as context vectors by message attention and synthesizes topic vectors by topic attention from the topic words of the message obtained from a pre-trained LDA model, with these vectors jointly affecting the generation of words in decoding. To increase the possibility of topic words appearing in responses, the model modiﬁes the generation probability of topic words by adding an extra probability item to bias the overall distribution. Empirical studies on both automatic evaluation metrics and human annotations show that TA-Seq2Seq can generate more informative and interesting responses, signiﬁcantly outperforming state-of-theart response generation models.

PDF Details

AAAI Conference 2016 Conference Paper

Hashtag-Based Sub-Event Discovery Using Mutually Generative LDA in Twitter

Chen Xing
Yuan Wang
Jie Liu
Yalou Huang
Wei-Ying Ma

Sub-event discovery is an effective method for social event analysis in Twitter. It can discover sub-events from large amount of noisy event-related information in Twitter and semantically represent them. The task is challenging because tweets are short, informal and noisy. To solve this problem, we consider leveraging event-related hashtags that contain many locations, dates and concise sub-event related descriptions to enhance sub-event discovery. To this end, we propose a hashtag-based mutually generative Latent Dirichlet Allocation model(MGe-LDA). In MGe-LDA, hashtags and topics of a tweet are mutually generated by each other. The mutually generative process models the relationship between hashtags and topics of tweets, and highlights the role of hashtags as a semantic representation of the corresponding tweets. Experimental results show that MGe-LDA can signiﬁcantly outperform state-of-the-art methods for sub-event discovery.

PDF Details