Arrow Research search

Author name cluster

Linjing Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAMAS Conference 2025 Conference Paper

CPE: A New Paradigm for Policy Extraction in Offline Reinforcement Learning

  • Zhaohui Yang
  • Xiaoxuan Wang
  • Linjing Li

Offline reinforcement learning (RL) aims to extract the optimal policy from static offline datasets but always encounters the notorious distribution shift problem. In order to address this problem, many previous offline RL algorithms primarily rely on modifications at policy evaluation stage. However, the performance gap between different policy extraction methods is significant even under the same value function. Thus, to address this issue, we focuses on the policy extraction stage and introduces a novel policy extraction method called Contrastive Policy Extraction (CPE), which samples action pairs at each state and leverages their relative values to improve the policy. By reformulating the optimal policy parameterization problem as a root-finding problem, CPE enhances the policy extraction capability and surpasses current prominent extraction methods in offline RL, such as AWAC and TD3BC. The proposed CPE is implemented within the iterative actor-critc framework and it substantially outperforms current state-of-the-art (SOTA) offline RL algorithms on D4RL benchmarks.

ICML Conference 2025 Conference Paper

Learning Dynamics in Continual Pre-Training for Large Language Models

  • Xingjin Wang
  • Howe Tissue
  • Lu Wang
  • Linjing Li
  • Daniel Dajun Zeng

Continual Pre-Training (CPT) has become a popular and effective method to apply strong foundation models to specific downstream tasks. In this work, we explore the learning dynamics throughout the CPT process for large language models (LLMs). We specifically focus on how general and downstream domain performance evolves at each training step, with domain performance measured via validation losses. We have observed that the CPT loss curve fundamentally characterizes the transition from one curve to another hidden curve, and could be described by decoupling the effects of distribution shift and learning rate (LR) annealing. We derive a CPT scaling law that combines the two factors, enabling the prediction of loss at any (continual) training steps and across learning rate schedules (LRS) in CPT. Our formulation presents a comprehensive understanding of several critical factors in CPT, including the learning rate, the training steps, and the distribution distance between PT and CPT datasets. Moreover, our approach can be adapted to customize training hyper-parameters to different CPT goals such as balancing general and domain-specific performance. Extensive experiments demonstrate that our scaling law holds across various CPT datasets and training hyper-parameters.

AAAI Conference 2025 Conference Paper

Learning Strategy Representation for Imitation Learning in Multi-Agent Games

  • Shiqi Lei
  • Kanghoon Lee
  • Linjing Li
  • Jinkyoo Park

The offline datasets for imitation learning (IL) in multi-agent games typically contain player trajectories exhibiting diverse strategies, which necessitate measures to prevent learning algorithms from acquiring undesirable behaviors. Learning representations for these trajectories is an effective approach to depicting the strategies employed by each demonstrator. However, existing learning strategies often require player identification or rely on strong assumptions, which are not appropriate for multi-agent games. Therefore, in this paper, we introduce the Strategy Representation for Imitation Learning (STRIL) framework, which (1) effectively learns strategy representations in multi-agent games, (2) estimates proposed indicators based on these representations, and (3) filters out sub-optimal data using the indicators. STRIL is a plug-in method that can be integrated into existing IL algorithms. We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four. Our approach successfully acquires strategy representations and indicators, thereby identifying dominant trajectories and significantly enhancing existing IL performance across these environments.

AAAI Conference 2025 Conference Paper

Learning Theorem Rationale for Improving the Mathematical Reasoning Capability of Large Language Models

  • Yu Sheng
  • Linjing Li
  • Daniel Dajun Zeng

Large language models (LLMs) have achieved significant progress in mathematical reasoning, especially in elementary math. However, they remain indisposed on tackling complex questions at high-school or college levels, which put forward a more advanced requirement of mastering relevant mathematical theorems. For we humans, whether selecting the appropriate theorems according to the provided question is a crucial factor affecting the quality of the ultimate solutions, yet which has been neglected by previous research in the field of LLM reasoning. In this paper, we propose a novel approach to enhance the LLM's capability of utilizing the mathematical theorems to specific problems, which we refer to as Theorem Rationale (TR). To this end, a new dataset encompassing problem-theorem-solution triples is deliberately established for transferring principles of TR. Furthermore, we develop an evolving strategy to boost hierarchical instructions oriented on the theorems to alleviate difficulty in acquiring the curated data and facilitate the digestion of theorem application from various perspectives. Evaluations on a wide range of public datasets exhibit that the model fine-tuned with our dataset achieves consistent improvements at varying mathematical levels compared to the backbone. And further ablation studies illustrate the effectiveness of our proposed evolutionary strategies on enhancing the model's capability of math problem-solving. Overall, extensive experiments reveal the potential of our proposed method which highlights the significance of aligning the problems with the concrete theorems for LLMs to alleviate hallucination and improve the models' mathematical reasoning capabilities.

NeurIPS Conference 2025 Conference Paper

Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL

  • Songjun Tu
  • Jiahao Lin
  • Qichao Zhang
  • Xiangyu Tian
  • Linjing Li
  • Xiangyuan Lan
  • Dongbin Zhao

Large reasoning models (LRMs) are proficient at generating explicit, step-by-step reasoning sequences before producing final answers. However, such detailed reasoning can introduce substantial computational overhead and latency, particularly for simple problems. To address this over-thinking problem, we explore how to equip LRMs with adaptive thinking capabilities—enabling them to dynamically decide whether or not to engage in explicit reasoning based on problem complexity. Building on R1-style distilled models, we observe that inserting a simple ellipsis (". .. ") into the prompt can stochastically trigger either a thinking or no-thinking mode, revealing a latent controllability in the reasoning behavior. Leveraging this property, we propose AutoThink, a multi-stage reinforcement learning (RL) framework that progressively optimizes reasoning policies via stage-wise reward shaping. AutoThink learns to invoke explicit reasoning only when necessary, while defaulting to succinct responses for simpler tasks. Experiments on five mainstream mathematical benchmarks demonstrate that AutoThink achieves favorable accuracy–efficiency trade-offs compared to recent prompting and RL-based pruning methods. It can be seamlessly integrated into any R1-style model, including both distilled and further fine-tuned variants. Notably, AutoThink improves relative accuracy by 6. 4\% while reducing token usage by 52\% on DeepSeek-R1-Distill-Qwen-1. 5B, establishing a scalable and adaptive reasoning paradigm for LRMs. Project Page: https: //github. com/ScienceOne-AI/AutoThink.

AAMAS Conference 2025 Conference Paper

Offline Meta Reinforcement Learning with Weighted Policy Constraints and Proximal Context Collection

  • Haorui Li
  • Jiaqi Liang
  • Linjing Li
  • Daniel Zeng

Offline meta-reinforcement learning (OMRL) encounters two key challenges: effectively learning the meta-policy from offline datasets and correctly inferring unseen tasks. Existing methods often address the first challenge by imposing policy constraints, but are limited by the suboptimal actions in offline datasets. For the second challenge, most focus on meta-training without enhancing task inference during meta-testing. To address these issues, we propose a novel method called weighted policy conStraints and proximal contExt coLlECtion sTrategy for OMRL (SELECT). During metatraining, we integrate policy constraints with weighted behavior cloning, allowing for more flexible policy learning while maintaining desirable behaviors. In the meta-testing phase, SELECT introduces a proximal context collection strategy that balances exploration and exploitation. This strategy gathers high-quality context, improving task inference and adaptation to unseen tasks. Experimental results show that SELECT significantly reduces the distributional shift, enhances the meta-policy’s generalization, and outperforms state-of-the-art methods across various domains.

AAMAS Conference 2024 Conference Paper

ELA: Exploited Level Augmentation for Offline Learning in Zero-Sum Games

  • Shiqi Lei
  • Kanghoon Lee
  • Linjing Li
  • Jinkyoo Park
  • Jiachen Li

Offline learning derives effective policies from expert demonstrators’ datasets without direct interaction. While recent research consider dataset characteristics like expertise level or multiple demonstrators, a distinct approach is necessary in zero-sum games, where outcomes significantly depend on the opponent’s strategy. In this study, we introduce a novel approach using unsupervised learning techniques to estimate the exploited level (EL) of each trajectory from the offline dataset of zero-sum games made by diverse demonstrators. The estimated EL is then integrated into offline learning to maximize the influence of the dominant strategy. Our method enables interpretable EL estimation in multiple zero-sum games, effectively identifying dominant strategies. Also, EL augmented offline learning significantly enhances the imitation and offline reinforcement learning algorithms in zero-sum games.

IS Journal 2024 Journal Article

SHAPAttack: Shapley-Guided Multigranularity Adversarial Attack Against Text Transformers

  • Jiahui Shi
  • Linjing Li
  • Daniel Zeng

Despite the great success of text transformers, recent studies have revealed their vulnerability to textual adversarial attacks. Existing attack methods are limited to a single granularity and often suffer from a low attack success rate and a high query cost. To mitigate these issues, we propose a Shapley-guided multigranularity adversarial attack (SHAPAttack) that generates adversarial examples (AEs). SHAPAttack expands the perturbation space by combining granularities at both the word and phrase levels, which enhances the diversity of the generated AEs. To improve attack efficiency and reduce the query cost, SHAPAttack adopts a query-free constituent importance ranking method guided by the Shapley value to measure the importance of each constituent. We conduct extensive experiments on three benchmark datasets across three text transformers. The experimental results demonstrate that SHAPAttack outperforms strong baselines in terms of both attack success rate and model queries, indicating the effectiveness and efficiency of the proposed method.

IS Journal 2023 Journal Article

CRule: Category-Aware Symbolic Multihop Reasoning on Knowledge Graphs

  • Zikang Wang
  • Linjing Li
  • Jinlin Li
  • Pengfei Zhao
  • Daniel Zeng

Multihop reasoning is essential in knowledge graph (KG) research and applications. Current methods rely on specific KG entities, while human cognition operates at a more abstract level. This article proposes a category-aware rule-based (CRule) approach for symbolic multihop reasoning. Specifically, given a KG, CRule first categorizes entities and constructs a category-aware KG; it then uses rules retrieved from the categorized KG to perform multihop reasoning on the original KG. Experiments on five datasets show that CRule is simple, is effective, and combines the advantages of symbolic and neural network methods. It overcomes symbolic reasoning’s complexity limitations, can perform reasoning on KGs of more than 300, 000 edges, and can be three times more efficient than neural network models.

IS Journal 2022 Journal Article

Hierarchical Multihop Reasoning on Knowledge Graphs

  • Zikang Wang
  • Linjing Li
  • Daniel Dajun Zeng

Multihop knowledge reasoning aims to find missing entities for incomplete triples by finding paths on knowledge graphs. It is a fundamental and important task. In this article, we devise a hierarchical reinforcement learning algorithm to model the reasoning process more effectively. Unlike existing methods directly reason on entities and relations, we adopt a high-level reasoning layer to deal with abstract concepts, which guides the reasoning process conducted at the low level for concrete entities and relations. Our approach yields competitive results on link prediction on both NELL-995 and FB15k-237 datasets. The comparison to baselines also demonstrates the effectiveness of the hierarchical structure.