Author name cluster

Zeqi Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

ICLR Conference 2025 Conference Paper

Self-Evolved Reward Learning for LLMS

Chenghua Huang
Zhizhen Fan
Lu Wang 0029
Fangkai Yang
Pu Zhao 0004
Zeqi Lin
Qingwei Lin
Dongmei Zhang 0001

Reinforcement Learning from Human Feedback (RLHF) is a crucial technique for aligning language models with human preferences and is a key factor in the success of modern conversational models like GPT-4, ChatGPT, and Llama 2. A significant challenge in employing RLHF lies in training a reliable RM, which relies on high-quality labels. Typically, these labels are provided by human experts or a stronger AI, both of which can be costly and introduce bias that may affect the language model's responses. As models improve, human input may become less effective in enhancing their performance. This paper explores the potential of using the RM itself to generate additional training data for a more robust RM. Our experiments demonstrate that reinforcement learning from self-feedback outperforms baseline approaches. We conducted extensive experiments with our approach on multiple datasets, such as HH-RLHF and UltraFeedback, and models including Mistral and Llama 3, comparing it against various baselines. Our results indicate that, even with a limited amount of human-labeled data, learning from self-feedback can robustly enhance the performance of the RM, thereby improving the capabilities of large language models.

Details

NeurIPS Conference 2024 Conference Paper

Make Your LLM Fully Utilize the Context

Shengnan An
Zexiong Ma
Zeqi Lin
Nanning Zheng
Jian-Guang Lou
Weizhu Chen

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FIll-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e. g. , 23. 5->26. 9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e. g. , 59. 3->59. 2 accuracy on MMLU).

PDF Details DOI

ICLR Conference 2023 Conference Paper

CodeT: Code Generation with Generated Tests

Bei Chen 0008
Fengji Zhang
Anh Nguyen
Daoguang Zan
Zeqi Lin
Jian-Guang Lou
Weizhu Chen

The task of generating code solutions for a given programming problem can benefit from the use of pre-trained language models such as Codex, which can produce multiple diverse samples. However, a major challenge for this task is to select the most appropriate solution from the multiple samples generated by the pre-trained language models. A natural way to evaluate the quality and correctness of a code solution is to run it against a set of test cases, but the manual creation of such test cases is often costly and time-consuming. In this paper, we propose a novel method, CodeT, that leverages the same pre-trained language models to automatically generate test cases for the code samples, thus reducing the human effort and increasing the coverage of the test scenarios. CodeT then executes the code samples using the generated test cases, and performs a dual execution agreement, which considers both the consistency of the outputs against the generated test cases and the agreement of the outputs with other code samples. We conduct comprehensive experiments on four benchmarks, HumanEval, MBPP, APPS and CodeContests, using five different pre-trained language models with varying sizes and capabilities. Our results show that CodeT can significantly improve the performance of code solution selection over previous methods, achieving remarkable and consistent gains across different models and benchmarks. For instance, CodeT improves the pass@1 metric on HumanEval to 65.8%, which represents an absolute improvement of 18.8% over the code-davinci-002 model, and an absolute improvement of more than 20% over the previous state-of-the-art results.

Details

ICLR Conference 2023 Conference Paper

Does Deep Learning Learn to Abstract? A Systematic Probing Framework

Shengnan An
Zeqi Lin
Bei Chen 0008
Qiang Fu 0015
Nanning Zheng 0001
Jian-Guang Lou

Abstraction is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context. At the same time, there is a lack of clear understanding about both the presence and further characteristics of this capability in deep learning models. In this paper, we introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective. A set of controlled experiments are conducted based on this framework, providing strong evidence that two probed pre-trained language models (PLMs), T5 and GPT2, have the abstraction capability. We also conduct in-depth analysis, thus shedding further light: (1) the whole training phase exhibits a "memorize-then-abstract" two-stage process; (2) the learned abstract concepts are gathered in a few middle-layer attention heads, rather than being evenly distributed throughout the model; (3) the probed abstraction capabilities exhibit robustness against concept mutations, and are more robust to low-level/source-side mutations than high-level/target-side ones; (4) generic pre-training is critical to the emergence of abstraction capability, and PLMs exhibit better abstraction with larger model sizes and data scales.

Details

IJCAI Conference 2022 Conference Paper

CERT: Continual Pre-training on Sketches for Library-oriented Code Generation

Daoguang Zan
Bei Chen
Dejian Yang
Zeqi Lin
Minsu Kim
Bei Guan
Yongji Wang
Weizhu Chen

Code generation is a longstanding challenge, aiming to generate a code snippet based on a natural language description. Usually, expensive text-code paired data is essential for training a code generation model. Recently, thanks to the success of pre-training techniques, large language models are trained on large unlabelled code corpora and perform well in generating code. In this paper, we investigate how to leverage an unlabelled code corpus to train a model for library-oriented code generation. Since it is a common practice for programmers to reuse third-party libraries, in which case the text-code paired data are harder to obtain due to the huge number of libraries. We observe that library-oriented code snippets are more likely to share similar code sketches. Hence, we present CERT with two steps: a sketcher generates the sketch, then a generator fills the details in the sketch. Both the sketcher and generator are continually pre-trained upon a base model using unlabelled data. Also, we carefully craft two benchmarks to evaluate library-oriented code generation named PandasEval and NumpyEval. Experimental results have shown the impressive performance of CERT. For example, it surpasses the base model by an absolute 15. 67% improvement in terms of pass@1 on PandasEval. Our work is available at https: //github. com/microsoft/PyCodeGPT.

PDF Details DOI

ICLR Conference 2022 Conference Paper

TAPEX: Table Pre-training via Learning a Neural SQL Executor

Qian Liu 0033
Bei Chen 0008
Jiaqi Guo
Morteza Ziyadi
Zeqi Lin
Weizhu Chen
Jian-Guang Lou

Recent progress in language model pre-training has achieved a great success via leveraging large-scale unstructured textual data. However, it is still a challenge to apply pre-training on structured tabular data due to the absence of large-scale high-quality tabular data. In this paper, we propose TAPEX to show that table pre-training can be achieved by learning a neural SQL executor over a synthetic corpus, which is obtained by automatically synthesizing executable SQL queries and their execution outputs. TAPEX addresses the data scarcity challenge via guiding the language model to mimic a SQL executor on the diverse, large-scale and high-quality synthetic corpus. We evaluate TAPEX on four benchmark datasets. Experimental results demonstrate that TAPEX outperforms previous table pre-training approaches by a large margin and achieves new state-of-the-art results on all of them. This includes the improvements on the weakly-supervised WikiSQL denotation accuracy to 89.5% (+2.3%), the WikiTableQuestions denotation accuracy to 57.5% (+4.8%), the SQA denotation accuracy to 74.5% (+3.5%), and the TabFact accuracy to 84.2% (+3.2%). To our knowledge, this is the first work to exploit table pre-training via synthetic executable programs and to achieve new state-of-the-art results on various downstream tasks. Our code can be found at https://github.com/microsoft/Table-Pretraining.

Details

AAAI Conference 2021 Conference Paper

Iterative Utterance Segmentation for Neural Semantic Parsing

Yinuo Guo
Zeqi Lin
Jian-Guang Lou
Dongmei Zhang

Neural semantic parsers usually fail to parse long and complex utterances into correct meaning representations, due to the lack of exploiting the principle of compositionality. To address this issue, we present a novel framework for boosting neural semantic parsers via iterative utterance segmentation. Given an input utterance, our framework iterates between two neural modules: a segmenter for segmenting a span from the utterance, and a parser for mapping the span into a partial meaning representation. Then, these intermediate parsing results are composed into the final meaning representation. One key advantage is that this framework does not require any handcraft templates or additional labeled data for utterance segmentation: we achieve this through proposing a novel training method, in which the parser provides pseudo supervision for the segmenter. Experiments on GEO, COMPLEXWEBQUESTIONS and FORMULAS show that our framework can consistently improve performances of neural semantic parsers in different domains. On data splits that require compositional generalization, our framework brings significant accuracy gains: GEO 63. 1 → 81. 2, FORMULAS 59. 7 → 72. 7, COMPLEXWE- BQUESTIONS 27. 1 → 56. 3.

PDF Details

AAAI Conference 2021 Conference Paper

Revisiting Iterative Back-Translation from the Perspective of Compositional Generalization

Yinuo Guo
Hualei Zhu
Zeqi Lin
Bei Chen
Jian-Guang Lou
Dongmei Zhang

Human intelligence exhibits compositional generalization (i. e. , the capacity to understand and produce unseen combinations of seen components), but current neural seq2seq models lack such ability. In this paper, we revisit iterative backtranslation, a simple yet effective semi-supervised method, to investigate whether and how it can improve compositional generalization. In this work: (1) We first empirically show that iterative back-translation substantially improves the performance on compositional generalization benchmarks (CFQ and SCAN). (2) To understand why iterative backtranslation is useful, we carefully examine the performance gains and find that iterative back-translation can increasingly correct errors in pseudo-parallel data. (3) To further encourage this mechanism, we propose curriculum iterative backtranslation, which better improves the quality of pseudoparallel data, thus further improving the performance.

PDF Details

NeurIPS Conference 2020 Conference Paper

Compositional Generalization by Learning Analytical Expressions

Qian Liu
Shengnan An
Jian-Guang Lou
Bei Chen
Zeqi Lin
Yan Gao
Bin Zhou
Nanning Zheng

Compositional generalization is a basic and essential intellective capability of human beings, which allows us to recombine known parts readily. However, existing neural network based models have been proven to be extremely deficient in such a capability. Inspired by work in cognition which argues compositionality can be captured by variable slots with symbolic functions, we present a refreshing view that connects a memory-augmented neural model with analytical expressions, to achieve compositional generalization. Our model consists of two cooperative neural modules, Composer and Solver, fitting well with the cognitive argument while being able to be trained in an end-to-end manner via a hierarchical reinforcement learning algorithm. Experiments on the well-known benchmark SCAN demonstrate that our model seizes a great ability of compositional generalization, solving all challenges addressed by previous works with 100% accuracies.

PDF Details

NeurIPS Conference 2020 Conference Paper

Hierarchical Poset Decoding for Compositional Generalization in Language

Yinuo Guo
Zeqi Lin
Jian-Guang Lou
Dongmei Zhang

We formalize human language understanding as a structured prediction task where the output is a partially ordered set (poset). Current encoder-decoder architectures do not take the poset structure of semantics into account properly, thus suffering from poor compositional generalization ability. In this paper, we propose a novel hierarchical poset decoding paradigm for compositional generalization in language. Intuitively: (1) the proposed paradigm enforces partial permutation invariance in semantics, thus avoiding overfitting to bias ordering information; (2) the hierarchical mechanism allows to capture high-level structures of posets. We evaluate our proposed decoder on Compositional Freebase Questions (CFQ), a large and realistic natural language question answering dataset that is specifically designed to measure compositional generalization. Results show that it outperforms current decoders.

PDF Details