Author name cluster

Yankai Lin

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Generalizing Experience for Language Agents with Hierarchical MetaFlows

Shengda Fan
Xin Cong
Zhong Zhang
Yuepeng Fu
Yesai Wu
Hao Wang
Xinyu Zhang
Enrui Hu

Recent efforts to employ large language models (LLMs) as agents have demonstrated promising results in a wide range of multi-step agent tasks. However, existing agents lack an effective experience reuse approach to leverage historical completed tasks. In this paper, we propose a novel experience reuse framework MetaFlowLLM, which constructs a hierarchical experience tree from historically completed tasks. Each node in this experience tree is presented as a MetaFlow which contains static execution workflow and subtask required by agents to complete dynamically. Then, we propose a Hierarchical MetaFlow Merging algorithm to construct the hierarchical experience tree. When accomplishing a new task, MetaFlowLLM can first retrieve the most relevant MetaFlow node from the experience tree and then execute it accordingly. To effectively generate valid MetaFlows from historical data, we further propose a reinforcement learning pipeline to train the MetaFlowGen. Extensive experimental results on AppWorld and WorkBench demonstrate that integrating with MetaFlowLLM, existing agents (e. g. , ReAct, Reflexion) can gain substantial performance improvement with reducing execution costs. Notably, MetaFlowLLM achieves an average success rate improvement of 32. 3% on AppWorld and 6. 2% on WorkBench, respectively.

PDF Details

NeurIPS Conference 2025 Conference Paper

Large Language Diffusion Models

Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a principled generative approach for probabilistic inference by optimizing a likelihood lower bound. Across extensive benchmarks on general tasks, math, code, and so on, LLaDA demonstrates strong scalability and performs comparably to our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings show the promise of diffusion models for language modeling at scale and challenge the common assumption that core LLM capabilities discussed above inherently depend on ARMs. Project page and codes: \url{https: //ml-gsai. github. io/LLaDA-demo/}.

PDF Details

NeurIPS Conference 2025 Conference Paper

Learning to Focus: Causal Attention Distillation via Gradient‐Guided Token Pruning

Yiju Guo
Wenkai Yang
Zexu Sun
Ning Ding
Zhiyuan Liu
Yankai Lin

Large language models (LLMs) have demonstrated significant improvements in contextual understanding. However, their ability to attend to truly critical information during long-context reasoning and generation still falls behind the pace. Specifically, our preliminary experiments reveal that certain distracting patterns can misdirect the model’s attention during inference, and removing these patterns substantially improves reasoning accuracy and generation quality. We attribute this phenomenon to spurious correlations in the training data, which obstruct the model’s capacity to infer authentic causal instruction–response relationships. This phenomenon may induce redundant reasoning processes, potentially resulting in significant inference overhead and, more critically, the generation of erroneous or suboptimal responses. To mitigate this, we introduce a two-stage framework called Learning to Focus (LeaF) leveraging intervention-based inference to disentangle confounding factors. In the first stage, LeaF employs gradient-based comparisons with an advanced teacher to automatically identify confounding tokens based on causal relationships in the training corpus. Then, in the second stage, it prunes these tokens during distillation to enact intervention, aligning the student’s attention with the teacher’s focus distribution on truly critical context tokens. Experimental results demonstrate that LeaF not only achieves an absolute improvement in various mathematical reasoning, code generation and multi-hop question answering benchmarks but also effectively suppresses attention to confounding tokens during inference, yielding a more interpretable and reliable reasoning model.

PDF Details

NeurIPS Conference 2025 Conference Paper

Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning

Wenkai Yang
Shuming Ma
Yankai Lin
Furu Wei

Recent studies have shown that making a model spend more time thinking through longer Chain of Thoughts (CoTs) enables it to gain significant improvements in complex reasoning tasks. While current researches continue to explore the benefits of increasing test-time compute by extending the CoT lengths of Large Language Models (LLMs), we are concerned about a potential issue hidden behind the current pursuit of test-time scaling: Would excessively scaling the CoT length actually bring adverse effects to a model's reasoning performance? Our explorations on mathematical reasoning tasks reveal an unexpected finding that scaling with longer CoTs can indeed impair the reasoning performance of LLMs in certain domains. Moreover, we discover that there exists an optimal scaled length distribution that differs across different domains. Based on these insights, we propose a Thinking-Optimal Scaling strategy. Our method first uses a small set of seed data with varying response length distributions to teach the model to adopt different reasoning efforts for deep thinking. Then, the model selects its shortest correct response under different reasoning efforts on additional problems for self-improvement. Our self-improved models built upon Qwen2. 5-32B-Instruct outperform other distillation-based 32B o1-like models across various math benchmarks, and achieve performance on par with the teacher model QwQ-32B-Preview that produces the seed data.

PDF Details

NeurIPS Conference 2024 Conference Paper

InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

Chaojun Xiao
Pengle Zhang
Xu Han
Guangxuan Xiao
Yankai Lin
Zhengyan Zhang
Zhiyuan Liu
Maosong Sun

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e. g. , LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introduce expensive computational overhead and uncontrollable change in model capabilities. In this paper, we unveil the intrinsic capacity of LLMs for understanding extremely long sequences without any fine-tuning. To this end, we introduce a training-free memory-based method, InfLLM. Specifically, InfLLM stores distant contexts into additional memory units and employs an efficient mechanism to lookup token-relevant units for attention computation. Thereby, InfLLM allows LLMs to efficiently process long sequences with a limited context window and well capture long-distance dependencies. Without any training, InfLLM enables LLMs that are pre-trained on sequences consisting of a few thousand tokens to achieve comparable performance with competitive baselines that continually train these LLMs on long sequences. Even when the sequence length is scaled to 1, 024K, InfLLM still effectively captures long-distance dependencies. Our code can be found at https: //github. com/thunlp/InfLLM.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents

Wenkai Yang
Xiaohan Bi
Yankai Lin
Sishuo Chen
Jie Zhou
Xu Sun

Driven by the rapid development of Large Language Models (LLMs), LLM-based agents have been developed to handle various real-world applications, including finance, healthcare, and shopping, etc. It is crucial to ensure the reliability and security of LLM-based agents during applications. However, the safety issues of LLM-based agents are currently under-explored. In this work, we take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents. We first formulate a general framework of agent backdoor attacks, then we present a thorough analysis of different forms of agent backdoor attacks. Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms: (1) From the perspective of the final attacking outcomes, the agent backdoor attacker can not only choose to manipulate the final output distribution, but also introduce the malicious behavior in an intermediate reasoning step only, while keeping the final output correct. (2) Furthermore, the former category can be divided into two subcategories based on trigger locations, in which the backdoor trigger can either be hidden in the user query or appear in an intermediate observation returned by the external environment. We implement the above variations of agent backdoor attacks on two typical agent tasks including web shopping and tool utilization. Extensive experiments show that LLM-based agents suffer severely from backdoor attacks and such backdoor vulnerability cannot be easily mitigated by current textual backdoor defense algorithms. This indicates an urgent need for further research on the development of targeted defenses against backdoor attacks on LLM-based agents. Warning: This paper may contain biased content.

PDF Details DOI

TMLR Journal 2023 Journal Article

When to Trust Aggregated Gradients: Addressing Negative Client Sampling in Federated Learning

Wenkai Yang
Yankai Lin
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun

Federated Learning has become a widely-used framework which allows learning a global model on decentralized local datasets under the condition of protecting local data privacy. However, federated learning faces severe optimization difficulty when training samples are not independently and identically distributed (non-i.i.d.). In this paper, we point out that the client sampling practice plays a decisive role in the aforementioned optimization difficulty. We find that the negative client sampling will cause the merged data distribution of currently sampled clients heavily inconsistent with that of all available clients, and further make the aggregated gradient unreliable. To address this issue, we propose a novel learning rate adaptation mechanism to adaptively adjust the server learning rate for the aggregated gradient in each round, according to the consistency between the merged data distribution of currently sampled clients and that of all available clients. Specifically, we make theoretical deductions to find a meaningful and robust indicator that is positively related to the optimal server learning rate, which is supposed to minimize the Euclidean distance between the aggregated gradient given currently sampled clients and that if all clients could participate in the current round. We show that our proposed indicator can effectively reflect the merged data distribution of sampled clients, thus we utilize it for the server learning rate adaptation. Extensive experiments on multiple image and text classification tasks validate the great effectiveness of our method in various settings. Our code is available at https://github.com/lancopku/FedGLAD.

PDF Details

IJCAI Conference 2022 Conference Paper

Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification

Deli Chen
Yankai Lin
Lei Li
Xuancheng Ren
Peng Li
Jie Zhou
Xu Sun

Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains underexplored. In this work, we first deeply probe the working mechanism of GCL in SSNC, and find that the promotion brought by GCL is severely unevenly distributed: the improvement mainly comes from subgraphs with less annotated information, which is fundamentally different from contrastive learning in other fields. However, existing GCL methods generally ignore this uneven distribution of annotated information and apply GCL evenly to the whole graph. To remedy this issue and further improve GCL in SSNC, we propose the Topology InFormation gain-Aware Graph Contrastive Learning (TIFA-GCL) framework that considers the annotated information distribution across graph in GCL. Extensive experiments on six benchmark graph datasets, including the enormous OGB-Products graph, show that TIFA-GCL can bring a larger improvement than existing GCL methods in both transductive and inductive settings. Further experiments demonstrate the generalizability and interpretability of TIFA-GCL.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Aspect-Level Sentiment-Controllable Review Generation with Mutual Learning Framework

Huimin Chen
Yankai Lin
Fanchao Qi
Jinyi Hu
Peng Li
Jie Zhou
Maosong Sun

Review generation, aiming to automatically generate review text according to the given information, is proposed to assist in the unappealing review writing. However, most of existing methods only consider the overall sentiments of reviews and cannot achieve aspect-level sentiment control. Even though some previous studies attempt to generate aspect-level sentiment-controllable reviews, they usually require largescale human annotations which are unavailable in the real world. To address this issue, we propose a mutual learning framework to take advantage of unlabeled data to assist the aspect-level sentiment-controllable review generation. The framework consists of a generator and a classifier which utilize confidence mechanism and reconstruction reward to enhance each other. Experimental results show our model can achieve aspect-sentiment control accuracy up to 88% without losing generation quality.

PDF Details

AAAI Conference 2021 Conference Paper

Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information

Qiu Ran
Yankai Lin
Peng Li
Jie Zhou

Non-autoregressive neural machine translation (NAT) generates each target word in parallel and has achieved promising inference acceleration. However, existing NAT models still have a big gap in translation quality compared to autoregressive neural machine translation models due to the multimodality problem: the target words may come from multiple feasible translations. To address this problem, we propose a novel NAT framework ReorderNAT which explicitly models the reordering information to guide the decoding of NAT. Specially, ReorderNAT utilizes deterministic and nondeterministic decoding strategies that leverage reordering information as a proxy for the final translation to encourage the decoder to choose words belonging to the same translation. Experimental results on various widely-used datasets show that our proposed model achieves better performance compared to most existing NAT models, and even achieves comparable translation quality as autoregressive translation models with a significant speedup.

PDF Details

NeurIPS Conference 2021 Conference Paper

Topology-Imbalance Learning for Semi-Supervised Node Classification

Deli Chen
Yankai Lin
Guangxiang Zhao
Xuancheng Ren
Peng Li
Jie Zhou
Xu Sun

The class imbalance problem, as an important issue in learning node representations, has drawn increasing attention from the community. Although the imbalance considered by existing studies roots from the unequal quantity of labeled examples in different classes (quantity imbalance), we argue that graph data expose a unique source of imbalance from the asymmetric topological properties of the labeled nodes, i. e. , labeled nodes are not equal in terms of their structural role in the graph (topology imbalance). In this work, we first probe the previously unknown topology-imbalance issue, including its characteristics, causes, and threats to semisupervised node classification learning. We then provide a unified view to jointly analyzing the quantity- and topology- imbalance issues by considering the node influence shift phenomenon with the Label Propagation algorithm. In light of our analysis, we devise an influence conflict detection–based metric Totoro to measure the degree of graph topology imbalance and propose a model-agnostic method ReNode to address the topology-imbalance issue by re-weighting the influence of labeled nodes adaptively based on their relative positions to class boundaries. Systematic experiments demonstrate the effectiveness and generalizability of our method in relieving topology-imbalance issue and promoting semi-supervised node classification. The further analysis unveils varied sensitivity of different graph neural networks (GNNs) to topology imbalance, which may serve as a new perspective in evaluating GNN architectures.

PDF Details

AAAI Conference 2020 Conference Paper

Measuring and Relieving the Over-Smoothing Problem for Graph Neural Networks from the Topological View

Deli Chen
Yankai Lin
Wei Li
Peng Li
Jie Zhou
Xu Sun

Graph Neural Networks (GNNs) have achieved promising performance on a wide range of graph-based tasks. Despite their success, one severe limitation of GNNs is the over-smoothing issue (indistinguishable representations of nodes in different classes). In this work, we present a systematic and quantitative study on the over-smoothing issue of GNNs. First, we introduce two quantitative metrics, MAD and MADGap, to measure the smoothness and oversmoothness of the graph nodes representations, respectively. Then, we verify that smoothing is the nature of GNNs and the critical factor leading to over-smoothness is the low information-to-noise ratio of the message received by the nodes, which is partially determined by the graph topology. Finally, we propose two methods to alleviate the oversmoothing issue from the topological view: (1) MADReg which adds a MADGap-based regularizer to the training objective; (2) AdaEdge which optimizes the graph topology based on the model predictions. Extensive experiments on 7 widely-used graph datasets with 10 typical GNN models show that the two proposed methods are effective for relieving the over-smoothing issue, thus improving the performance of various GNN models.

PDF Details

AAAI Conference 2018 Conference Paper

Improving Neural Fine-Grained Entity Typing With Knowledge Attention

Ji Xin
Yankai Lin
Zhiyuan Liu
Maosong Sun

Fine-grained entity typing aims to identify the semantic type of an entity in a particular plain text. It is an important task which can be helpful for a lot of natural language processing (NLP) applications. Most existing methods typically extract features separately from the entity mention and context words for type classiﬁcation. These methods inevitably fail to model complex correlations between entity mentions and context words. They also neglect rich background information about these entities in knowledge bases (KBs). To address these issues, we take information from KBs into consideration to bridge entity mentions and their context together, and thereby propose Knowledge-Attention Neural Fine-Grained Entity Typing. Experimental results and case studies on real-world datasets demonstrate that our model signiﬁcantly outperforms other state-of-the-art methods, revealing the effectiveness of incorporating KB information for entity typing. Code and data for this paper can be found at https: //github. com/thunlp/KNET.

PDF Details

IJCAI Conference 2016 Conference Paper

Knowledge Representation Learning with Entities, Attributes and Relations

Yankai Lin
Zhiyuan Liu
Maosong Sun

Distributed knowledge representation (KR) encodes both entities and relations in a low-dimensional semantic space, which has significantly promoted the performance of relation extraction and knowledge reasoning. In many knowledge graphs (KG), some relations indicate attributes of entities (attributes) and others indicate relations between entities (relations). Existing KR models regard all relations equally, and usually suffer from poor accuracies when modeling one-to-many and many-to-one relations, mostly composed of attribute. In this paper, we distinguish existing KG-relations into attributes and relations, and propose a new KR model with entities, attributes and relations (KR-EAR). The experiment results show that, by special modeling of attribute, KR-EAR can significantly outperform state-of-the-art KR models in prediction of entities, attributes and relations.

PDF Details

AAAI Conference 2015 Conference Paper

Learning Entity and Relation Embeddings for Knowledge Graph Completion

Yankai Lin
Zhiyuan Liu
Maosong Sun
Yang Liu
Xuan Zhu

Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to stateof-the-art baselines including TransE and TransH. The source code of this paper can be obtained from https: //github. com/mrlyk423/relation extraction.

PDF Details