Arrow Research search

Author name cluster

Xiaopeng Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

15 papers
1 author row

Possible papers

15

EAAI Journal 2026 Journal Article

Fuzzy rule-based uncertainty identification control for underactuated space flexible link manipulators: Compensation for payload perturbations during capture

  • Dongyang Shang
  • Haozhe Wang
  • Meng Yin
  • Xiaopeng Li

The underactuated space flexible link manipulator (USFLM), comprising a servo motor, a slender flexible link, and an underactuated hand, can be deployed on space stations to assist astronauts in critical tasks. While the underactuated hand enhances the manipulator's grasping capability, variations in payload mass, coupled with unknown external disturbances and link flexibility, induce rotational angle fluctuations that degrade trajectory tracking accuracy and operational precision. To address these challenges, this paper proposes a fuzzy rule-based uncertainty identification control strategy for the USFLM, employing an improved fuzzy-compensated sliding mode control (FCSMC) strategy. The controller leverages the universal approximation property of fuzzy rules to estimate and compensate for dynamic uncertainties, thereby minimizing tracking errors. The dynamic model of the USFLM is derived using flexible structure vibration theory and Lagrange mechanics, incorporating multiple nonlinearities. A simplified modeling method is introduced by selectively eliminating nonlinear terms, facilitating real-time control law implementation. The control law is rigorously designed based on the Lyapunov stability theory to ensure system stability. Both numerical simulations and ground prototype experiments demonstrate that the proposed method effectively identifies and compensates for dynamic uncertainties, significantly improving trajectory tracking performance.

AAAI Conference 2026 Conference Paper

Leveraging Image as Compressed Visual Prompt and Hierarchical Visual Knowledge for Effective Image Utilization in MLLMs

  • Shezheng Song
  • Kangcheng Ding
  • Shan Zhao
  • Shasha Li
  • Xiaopeng Li
  • Chengyu Wang
  • Qian Wan
  • Bin Ji

Multimodal Large Language Models (MLLMs) integrate text and images for complex reasoning tasks, but efficiently utilizing image remains a challenge due to redundancy and noise. Traditional methods take the entire image features as visual prompt into the MLLMs, leading to excessive visual tokens that disrupt textual information expression. Thus, recent studies treat image features as visual knowledge, storing them in the feed-forward network for retrieval when needed. These methods, completely removing images from the input, may hinder the activation of image-related knowledge. Besides, current visual knowledge focuses on fine-grained details but overlooks the hierarchical process of visual perception. As described in feature integration theory, global structure is first processed before details are integrated. Ignoring this process may lead to a fragmented visual understanding, making it difficult to capture high-level semantic relationships. To overcome these issues, we propose a novel image utilization mechanism in MLLMs. We leverage a compression-based attention mechanism to generate the compressed visual prompt, which not only mitigates the interference of excessively long visual prompts but also preserves crucial visual information necessary for activating knowledge in the MLLM. Furthermore, we extract hierarchical visual features as visual knowledge using wavelet transforms, allowing the model to capture both global structures and fine-grained details. Experiments show that our method achieves state-of-the-art performance.

AAAI Conference 2026 Conference Paper

Personalize Before Retrieve: LLM-based Personalized Query Expansion for User-Centric Retrieval

  • Yingyi Zhang
  • Pengyue Jia
  • Derong Xu
  • Yi Wen
  • Xianneng Li
  • Yichao Wang
  • Wenlin Zhang
  • Xiaopeng Li

Retrieval-Augmented Generation (RAG) critically depends on effective query expansion to retrieve relevant information. However, existing expansion methods adopt uniform strategies that overlook user-specific semantics, ignoring individual expression styles, preferences, and historical context. In practice, identical queries in text can express vastly different intentions across users. This representational rigidity limits the ability of current RAG systems to generalize effectively in personalized settings. Specifically, we identify two core challenges for personalization: 1) user expression styles are inherently diverse, making it difficult for standard expansions to preserve personalized intent. 2) user corpora induce heterogeneous semantic structures—varying in topical focus and lexical organization—which hinders the effective anchoring of expanded queries within the user’s corpora space. To address these challenges, we propose Personalize Before Retrieve (PBR), a framework that incorporates user-specific signals into query expansion prior to retrieval. PBR consists of two components: P-PRF, which generates stylistically aligned pseudo feedback using user history for simulating user expression style, and P-Anchor, which performs graph-based structure alignment over user corpora to capture its structure. Together, they produce personalized query representations tailored for retrieval. Experiments on two personalized benchmarks show that PBR consistently outperforms strong baselines, with up to 10% gains on PersonaBench across retrievers. Our findings demonstrate the value of modeling personalization before retrieval to close the semantic gap in user-adaptive RAG systems.

TMLR Journal 2026 Journal Article

Stepwise Guided Policy Optimization: Coloring Your Incorrect Reasoning in GRPO

  • Peter Chen
  • Xiaopeng Li
  • Ziniu Li
  • Xi Chen
  • Tianyi Lin

Reinforcement learning (RL) has proven effective in strengthening the reasoning capabilities of large language models (LLMs). A widely adopted method, Group Relative Policy Optimization (GRPO)~\citep{Shao-2024-Deepseekmath}, has shown strong empirical results in training recent reasoning models~\citep{Guo-2025-Deepseek}, but it fails to update the policy when all responses within a group are incorrect (i.e., all-negative-sample groups). This limitation highlights a gap between artificial and human intelligence: unlike humans, who can learn from mistakes, GRPO discards these failure signals. We introduce a simple framework to mitigate the all-negative-sample issue by incorporating response diversity within groups using a \textit{step-wise} judge model, which can be trained directly or adapted from existing LLMs. In a simplified setting, we prove that this diversification accelerates GRPO’s learning dynamics. We then empirically validate Stepwise Guided Policy Optimization (SGPO) across model sizes (7B, 14B, 32B) in both offline and online training on nine reasoning benchmarks (including base and distilled variants). Overall, SGPO improves average performance and is effective in early and mid-training when all-negative groups are prevalent, while improvements are not uniform across every benchmark and depend on the structure and informativeness of negative samples. Finally, SGPO does not require the judge model to generate correct solutions, distinguishing it from knowledge distillation methods.

TMLR Journal 2025 Journal Article

AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving

  • Shuo Xing
  • Hongyuan Hua
  • Xiangbo Gao
  • Shenzhe Zhu
  • Renjie Li
  • Kexin Tian
  • Xiaopeng Li
  • Heng Huang

Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candidates for end-to-end driving systems. However, limited work exists on studying the trustworthiness of DriveVLMs—a critical factor that directly impacts public transportation safety. In this paper, we introduce AutoTrust, a comprehensive trustworthiness benchmark for large vision-language models in autonomous driving (DriveVLMs), considering diverse perspectives---including trustfulness, safety, robustness, privacy, and fairness. We constructed the largest visual question-answering dataset for investigating trustworthiness issues in driving scenarios, comprising over 10k unique scenes and 18k queries. We evaluated six publicly available VLMs, spanning from generalist to specialist, from open-source to commercial models. Our exhaustive evaluations have unveiled previously undiscovered vulnerabilities of DriveVLMs to trustworthiness threats. Specifically, we found that the general VLMs like LLaVA-v1.6 and GPT-4o-mini surprisingly outperform specialized models fine-tuned for driving in terms of overall trustworthiness. DriveVLMs like DriveLM-Agent are particularly vulnerable to disclosing sensitive information. Additionally, both generalist and specialist VLMs remain susceptible to adversarial attacks and struggle to ensure unbiased decision-making across diverse environments and populations. Our findings call for immediate and decisive action to address the trustworthiness of DriveVLMs--an issue of critical importance to public safety and the welfare of all citizens relying on autonomous transportation systems. We release all the codes and datasets in https://github.com/taco-group/AutoTrust.

NeurIPS Conference 2025 Conference Paper

Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning

  • Wenlin Zhang
  • Xiangyang Li
  • Kuicai Dong
  • Yichao Wang
  • Pengyue Jia
  • Xiaopeng Li
  • Yingyi Zhang
  • Derong Xu

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge, yet traditional RAG systems struggle with static workflows and limited adaptability for complex, multistep reasoning tasks. Agentic RAG systems, such as DeepResearch, address these issues through dynamic retrieval, iterative context refinement, and adaptive workflows. However, recent methods like Search-R1, which rely on outcome-based reinforcement learning, face challenges such as low exploration efficiency, gradient conflict, and sparse reward signals. To tackle these limitations, we introduce ReasonRAG, a novel method that leverages RAG-ProGUIDE—a high-quality dataset providing fine-grained, process-level rewards for query generation, evidence extraction, and answer generation. By employing process-supervised reinforcement learning, ReasonRAG enhances LLMs’ autonomous capabilities in search, query generation, evidence extraction, and answer synthesis. Experimental results show that ReasonRAG, utilizing RAG-ProGUIDE, outperforms existing approaches like Search-R1 and traditional RAG systems, achieving superior performance on five benchmark datasets with only 5k training instances—significantly fewer than the 90k required by Search-R1. Our code is available at https: //github. com/Applied-Machine-Learning-Lab/ReasonRAG.

NeurIPS Conference 2025 Conference Paper

Rethinking Residual Distribution in Locate-then-Edit Model Editing

  • Xiaopeng Li
  • Shangwen Wang
  • Shasha Li
  • Shezheng Song
  • Bin Ji
  • Ma Jun
  • Jie Yu

Model editing enables targeted updates to the knowledge of large language models (LLMs) with minimal retraining. Among existing approaches, locate-then-edit methods constitute a prominent paradigm: they first identify critical layers, then compute residuals at the final critical layer based on the target edit, and finally apply least-squares-based multi-layer updates via $\textbf{residual distribution}$. While empirically effective, we identify a counterintuitive failure mode: residual distribution, a core mechanism in these methods, introduces weight shift errors that undermine editing precision. Through theoretical and empirical analysis, we show that such errors increase with the distribution distance, batch size, and edit sequence length, ultimately leading to inaccurate or suboptimal edits. To address this, we propose the $\textbf{B}$oundary $\textbf{L}$ayer $\textbf{U}$pdat$\textbf{E (BLUE)}$ strategy to enhance locate-then-edit methods. Sequential batch editing experiments on three LLMs and two datasets demonstrate that BLUE not only delivers an average performance improvement of 35. 59\%, significantly advancing the state of the art in model editing, but also enhances the preservation of LLMs' general capabilities. Our code is available at https: //github. com/xpq-tech/BLUE.

AAAI Conference 2025 Conference Paper

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

  • Xiaopeng Li
  • Shasha Li
  • Shezheng Song
  • Huijun Liu
  • Bin Ji
  • Xi Wang
  • Jun Ma
  • Jie Yu

The general capabilities of large language models (LLMs) make them the infrastructure for various AI applications, but updating their inner knowledge requires significant resources. Recent model editing is a promising technique for efficiently updating a small amount of knowledge of LLMs and has attracted much attention. In particular, local editing methods, which directly update model parameters, are proven suitable for updating small amounts of knowledge. Local editing methods update weights by computing least squares closed-form solutions and identify edited knowledge by vector-level matching in inference, which achieve promising results. However, these methods still require a lot of time and resources to complete the computation. Moreover, vector-level matching lacks reliability, and such updates disrupt the original organization of the model's parameters. To address these issues, we propose a detachable and expandable Subject Word Embedding Altering (SWEA) framework, which finds the editing embeddings through token-level matching and adds them to the subject word embeddings in Transformer input. To get these editing embeddings, we propose optimizing then suppressing fusion method, which first optimizes learnable embedding vectors for the editing target and then suppresses the Knowledge Embedding Dimensions (KEDs) to obtain final editing embeddings. We thus propose SWEAOS method for editing factual knowledge in LLMs. We demonstrate the overall state-of-the-art (SOTA) performance of SWEAOS on the CounterFact and zsRE datasets. To further validate the reasoning ability of SWEAOS in editing knowledge, we evaluate it on the more complex RippleEdits benchmark. The results demonstrate that SWEAOS possesses SOTA reasoning ability.

AAAI Conference 2024 Conference Paper

D3: A Methodological Exploration of Domain Division, Modeling, and Balance in Multi-Domain Recommendations

  • Pengyue Jia
  • Yichao Wang
  • Shanru Lin
  • Xiaopeng Li
  • Xiangyu Zhao
  • Huifeng Guo
  • Ruiming Tang

To enhance the efficacy of multi-scenario services in industrial recommendation systems, the emergence of multi-domain recommendation has become prominent, which entails simultaneous modeling of all domains through a unified model, effectively capturing commonalities and differences among them. However, current methods rely on manual domain partitioning, which overlook the intricate domain relationships and the heterogeneity of different domains during joint optimization, hindering the integration of domain commonalities and differences. To address these challenges, this paper proposes a universal and flexible framework D3 aimed at optimizing the multi-domain recommendation pipeline from three key aspects. Firstly, an attention-based domain adaptation module is introduced to automatically identify and incorporate domain-sensitive features during training. Secondly, we propose a fusion gate module that enables the seamless integration of commonalities and diversities among domains, allowing for implicit characterization of intricate domain relationships. Lastly, we tackle the issue of joint optimization by deriving loss weights from two complementary viewpoints: domain complexity and domain specificity, alleviating inconsistencies among different domains during the training phase. Experiments on three public datasets demonstrate the effectiveness and superiority of our proposed framework. In addition, D3 has been implemented on a real-life, high-traffic internet platform catering to millions of users daily.

NeurIPS Conference 2024 Conference Paper

G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

  • Pengyue Jia
  • Yiding Liu
  • Xiaopeng Li
  • Yuhao Wang
  • Yantong Du
  • Xiao Han
  • Xuetao Wei
  • Shuaiqiang Wang

Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily confuse distant images with similar visual contents, or cannot adapt to various locations worldwide with different amounts of relevant data. To resolve these limitations, we propose G3, a novel framework based on Retrieval-Augmented Generation (RAG). In particular, G3 consists of three steps, i. e. , G eo-alignment, G eo-diversification, and G eo-verification to optimize both retrieval and generation phases of worldwide geolocalization. During Geo-alignment, our solution jointly learns expressive multi-modal representations for images, GPS and textual descriptions, which allows us to capture location-aware semantics for retrieving nearby images for a given query. During Geo-diversification, we leverage a prompt ensembling method that is robust to inconsistent retrieval performance for different image queries. Finally, we combine both retrieved and generated GPS candidates in Geo-verification for location prediction. Experiments on two well-established datasets IM2GPS3k and YFCC4k verify the superiority of G3 compared to other state-of-the-art methods. Our code is available online https: //github. com/Applied-Machine-Learning-Lab/G3 for reproduction.

NeurIPS Conference 2024 Conference Paper

LeDex: Training LLMs to Better Self-Debug and Explain Code

  • Nan Jiang
  • Xiaopeng Li
  • Shiqi Wang
  • Qiang Zhou
  • Soneya B. Hossain
  • Baishakhi Ray
  • Varun Kumar
  • Xiaofei Ma

In the domain of code generation, self-debugging is crucial. It allows LLMs to refine their generated code based on execution feedback. This is particularly important because generating correct solutions in one attempt proves challenging for complex tasks. Prior works on self-debugging mostly focus on prompting methods by providing LLMs with few-shot examples, which work poorly on small open-sourced LLMs. In this work, we propose LeDex, a training framework that significantly improves the self-debugging capability of LLMs. Intuitively, we observe that a chain of explanations on the wrong code followed by code refinement helps LLMs better analyze the wrong code and do refinement. We thus propose an automated pipeline to collect a high-quality dataset for code explanation and refinement by generating a number of explanations and refinement trajectories from the LLM itself or a larger teacher model and filtering via execution verification. We perform supervised fine-tuning (SFT) and further reinforcement learning (RL) on both success and failure trajectories with a novel reward design considering code explanation and refinement quality. SFT improves the pass@1 by up to 15. 92\% and pass@10 by 9. 30\% over four benchmarks. RL training brings additional up to 3. 54\% improvement on pass@1 and 2. 55\% improvement on pass@10. The trained LLMs show iterative refinement ability and can keep refining code continuously. Lastly, our human evaluation shows that the LLMs trained with our framework generate more useful code explanations and help developers better understand bugs in source code.

EAAI Journal 2024 Journal Article

Neural network model identification control of dual-inertia system with a flexible load considering payload mass variation and nonlinear deformation

  • Dongyang Shang
  • Xiaopeng Li
  • Meng Yin
  • Fanjie Li

The dual-inertia system with a flexible load (DSFL) is a complex nonlinear model, which originates from the flexible load deformation and payload mass variation. In dynamic modeling, it is impossible to consider all the nonlinear factors, including high-order modes and inaccurate friction torque, which leads to the existence of uncertain components in the DSFL. Based on this, it is impossible to build a dynamic model of the DSFL that is completely consistent with the real physical model. This modeling errors increases the tracking errors of the DSFL. In this study, the neural networks are employed to recognize the lumped uncertain components of the DSFL dynamic model, so as to reduce the flexible load's tracking error. Firstly, based on the deformation accuracy comparison, a relatively high precision model simplification method is proposed, which neglects the second order mode. Next, the control law of the neural network identification control strategy is designed according to the stability theorem. Finally, through physical control experiments and simulation analysis, it is shown that this control strategy enhances the motion accuracy of the DSFL. The experimental results show that this control strategy predicted by the radial basis function (RBF) neural network can reduce the rotation angle error by 15. 86%.

AAAI Conference 2024 Conference Paper

PMET: Precise Model Editing in a Transformer

  • Xiaopeng Li
  • Shasha Li
  • Shezheng Song
  • Jing Yang
  • Jun Ma
  • Jie Yu

Model editing techniques modify a minor proportion of knowledge in Large Language Models (LLMs) at a relatively low cost, which have demonstrated notable success. Existing methods assume Transformer Layer (TL) hidden states are values of key-value memories of the Feed-Forward Network (FFN). They usually optimize the TL hidden states to memorize target knowledge and use it to update the weights of the FFN in LLMs. However, the information flow of TL hidden states comes from three parts: Multi-Head Self-Attention (MHSA), FFN, and residual connections. Existing methods neglect the fact that the TL hidden states contains information not specifically required for FFN. Consequently, the performance of model editing decreases. To achieve more precise model editing, we analyze hidden states of MHSA and FFN, finding that MHSA encodes certain general knowledge extraction patterns. This implies that MHSA weights do not require updating when new knowledge is introduced. Based on above findings, we introduce PMET, which simultaneously optimizes Transformer Component (TC, namely MHSA and FFN) hidden states, while only using the optimized TC hidden states of FFN to precisely update FFN weights. Our experiments demonstrate that PMET exhibits state-of-the-art performance on both the \textsc{counterfact} and zsRE datasets. Our ablation experiments substantiate the effectiveness of our enhancements, further reinforcing the finding that the MHSA encodes certain general knowledge extraction patterns and indicating its storage of a small amount of factual knowledge. Our code is available at \url{https://github.com/xpq-tech/PMET}.

AAAI Conference 2020 Conference Paper

Not All Attention Is Needed: Gated Attention Network for Sequence Data

  • Lanqing Xue
  • Xiaopeng Li
  • Nevin L. Zhang

Although deep neural networks generally have fixed network structures, the concept of dynamic mechanism has drawn more and more attention in recent years. Attention mechanisms compute input-dependent dynamic attention weights for aggregating a sequence of hidden states. Dynamic network configuration in convolutional neural networks (CNNs) selectively activates only part of the network at a time for different inputs. In this paper, we combine the two dynamic mechanisms for text classification tasks. Traditional attention mechanisms attend to the whole sequence of hidden states for an input sentence, while in most cases not all attention is needed especially for long sequences. We propose a novel method called Gated Attention Network (GA-Net) to dynamically select a subset of elements to attend to using an auxiliary network, and compute attention weights to aggregate the selected elements. It avoids a significant amount of unnecessary computation on unattended elements, and allows the model to pay attention to important parts of the sequence. Experiments in various datasets show that the proposed method achieves better performance compared with all baseline models with global or local attention while requiring less computation and achieving better interpretability. It is also promising to extend the idea to more complex attention-based models, such as transformers and seq-to-seq models.

IJCAI Conference 2018 Conference Paper

Building Sparse Deep Feedforward Networks using Tree Receptive Fields

  • Xiaopeng Li
  • Zhourong Chen
  • Nevin L. Zhang

Sparse connectivity is an important factor behind the success of convolutional neural networks and recurrent neural networks. In this paper, we consider the problem of learning sparse connectivity for feedforward neural networks (FNNs). The key idea is that a unit should be connected to a small number of units at the next level below that are strongly correlated. We use Chow-Liu's algorithm to learn a tree-structured probabilistic model for the units at the current level, use the tree to identify subsets of units that are strongly correlated, and introduce a new unit with receptive field over the subsets. The procedure is repeated on the new units to build multiple layers of hidden units. The resulting model is called a TRF-net. Empirical results show that, when compared to dense FNNs, TRF-net achieves better or comparable classification performance with much fewer parameters and sparser structures. They are also more interpretable.