Author name cluster

Hao Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

116 papers

2 author rows

AAAI Conference 2026 Conference Paper

Agentmandering: A Game-Theoretic Framework for Fair Redistricting via Large Language Model Agents

Hao Li
Haotian Chen
Ruoyuan Gong
Juanjuan Wang
Hao Jiang

Redistricting plays a central role in shaping how votes are translated into political power. While existing computational methods primarily aim to generate large ensembles of legally valid districting plans, they often neglect the strategic dynamics involved in the selection process. This oversight creates opportunities for partisan actors to cherry-pick maps that, while technically compliant, are politically advantageous. Simply satisfying formal constraints does not ensure fairness when the selection process itself can be manipulated. We propose Agentmandering, a framework that reimagines redistricting as a turn-based negotiation between two agents representing opposing political interests. Drawing inspiration from game-theoretic ideas, particularly the Choose-and-Freeze protocol, our method embeds strategic interaction into the redistricting process via large language model (LLM) agents. Agents alternate between selecting and freezing districts from a small set of candidate maps, gradually partitioning the state through constrained and interpretable choices. Evaluation on post-2020 U.S. Census data across all states shows that Agentmandering significantly reduces partisan bias and unfairness, while achieving 2 to 3 orders of magnitude lower variance than standard baselines. These results demonstrate both fairness and stability, especially in swing-state scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

CADTrack: Learning Contextual Aggregation with Deformable Alignment for Robust RGBT Tracking

Hao Li
Yuhao Wang
Xiantao Hu
Wenning Hao
Pingping Zhang
Dong Wang
Huchuan Lu

RGB-Thermal (RGBT) tracking aims to exploit visible and thermal infrared modalities for robust all-weather object tracking. However, existing RGBT trackers struggle to resolve modality discrepancies, which poses great challenges for robust feature representation. This limitation hinders effective cross-modal information propagation and fusion, which significantly reduces the tracking accuracy. To address this limitation, we propose a novel Contextual Aggregation with Deformable Alignment framework called CADTrack for RGBT Tracking. To be specific, we first deploy the Mamba-based Feature Interaction (MFI) that establishes efficient feature interaction via state space models. This interaction module can operate with linear complexity, reducing computational cost and improving feature discrimination. Then, we propose the Contextual Aggregation Module (CAM) that dynamically activates backbone layers through sparse gating based on the Mixture-of-Experts (MoE). This module can encode complementary contextual information from cross-layer features. Finally, we propose the Deformable Alignment Module (DAM) to integrate deformable sampling and temporal propagation, mitigating spatial misalignment and localization drift. With the above components, our CADTrack achieves robust and accurate tracking in complex scenarios. Extensive experiments on five RGBT tracking benchmarks verify the effectiveness of our proposed method.

PDF Details DOI

AAAI Conference 2026 Conference Paper

DcSplat: Dual-Constraint Human Gaussian Splatting with Latent Multi-View Consistency

Tengfei Xiao
Yue Wu
Zhigang Gao
Yongzhe Yuan
Can Qin
Hao Li
Mingyang Zhang

Human Novel View Synthesis (HNVS) aims to synthesize photorealistic human images from novel viewpoints given observations from known views. Despite significant advances achieved by existing methods such as NeRF, diffusion models, and 3DGS, they still face substantial challenges in achieving stable modeling from a single image. In this paper, we introduce Dual-Constraint Human Gaussian Splatting (DcSplat), a novel, simple, and efficient 3D Gaussian-based framework for single-view 3D human reconstruction. To address occlusion-induced texture missing and depth ambiguities, we introduce two key components: a Latent Multi-View Consistency Constraint Mechanism and a Geometric Constraint Module. The former employs a Latent-space Appearance Transformer (LatentFormer) to learn semantically coherent, view-consistent appearance priors via SMPL-guided pseudo-view fusion. The latter refines noisy SMPL-based depth through a U-Net-like structure conditioned on latent appearance features. These two modules are jointly optimized to generate high-quality Gaussian parameters in a unified latent space. Extensive experiments demonstrate that DcSplat outperforms existing SOTA methods in both geometry and texture quality, while achieving fast inference and lower computational cost.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FDP: A Frequency-Decomposition Preprocessing Pipeline for Unsupervised Anomaly Detection in Brain MRI

Hao Li
Zhenfeng Zhuang
Jingyu Lin
Yu Liu
Yifei Chen
Qiong Peng
Lequan Yu
Liansheng Wang

Due to the diversity of brain anatomy and the scarcity of annotated data, supervised anomaly detection for brain MRI remains challenging, driving the development of unsupervised anomaly detection (UAD) approaches. Current UAD methods typically utilize synthetically generated noise perturbations on healthy MRIs to train generative models for normal anatomy reconstruction, enabling anomaly detection via residual maps. However, such simulated anomalies lack the biophysical fidelity and morphological complexity characteristic of true clinical lesions. To advance UAD in brain MRI, we conduct the first systematic frequency-domain analysis of pathological signatures, revealing two key properties: (1) anomalies exhibit unique frequency patterns distinguishable from normal anatomy, and (2) low-frequency signals maintain consistent representations across healthy scans. These insights motivate our Frequency-Decomposition Preprocessing (FDP) framework—the first UAD method to leverage frequency-domain reconstruction for simultaneous pathology suppression and anatomical preservation. FDP can integrate seamlessly with existing anomaly simulation techniques, consistently enhancing detection performance across diverse architectures while maintaining diagnostic fidelity. Experimental results demonstrate that FDP consistently improves anomaly detection performance when integrated with existing methods. Notably, FDP achieves a 17.63% increase in DICE score with LDM while maintaining robust improvements across multiple baselines.

PDF Details DOI

JBHI Journal 2026 Journal Article

Few-Shot Class-Incremental Learning With Dynamic Prototype Refinement for Brain Activity Classification

Lei Cao
Hao Li
Yilin Dong
Tianyu Liu
Jie Li

The brain-computer interface (BCI) system facilitates efficient communication and control, with Electroencephalography (EEG) signals as a vital component. Traditional EEG signal classification, based on static deep-learning models, presents a challenge when new classes of the subject’s brain activity emerge. The goal is to develop a model that can recognize new few-shot classes while preserving its ability to discriminate between existing ones. This scenario is referred to as Few-Shot Class-Incremental Learning (FSCIL). This work introduces IncrementEEG, a novel framework meticulously designed to tackle the distinct challenges of FSCIL in EEG-based brain activity classification, focusing specifically on emotion recognition and steady-state visual evoked potential (SSVEP). Our work analyzes the role of additive angular margin loss in improving the model’s discrimination capabilities. The proposed method is designed to demonstrate robustness in open-world conditions and adaptability to new tasks. Furthermore, we introduce a prototype refinement module comprising a prototype augmentation block and an update block. The prototype augmentation block in the deep feature space preserves the decision boundary for prior tasks, and the prototype update block utilizes a shared embedding space to compute the relation matrix for bootstrapping prototype updates. Extensive experiments conducted across multiple datasets show the superior performance of the IncrementEEG framework compared to state-of-the-art methods. The proposed method advances FSCIL brain activity classification, offering promising potential for applications in Brain-Computer Interface systems.

Details DOI

AAAI Conference 2026 Conference Paper

From Sampling to Cognition: Modeling Internal Cognitive Confidence in Language Models for Robust Uncertainty Calibration

Hao Li
Tao He
Jiafeng Liang
Zheng Chu
Ming Liu

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet they generally lack self-awareness, often displaying overconfidence when confronted with questions beyond their knowledge boundaries. This limitation severely hinders their trustworthiness in high-stakes scenarios. Existing calibration methods typically rely on sampling accuracy, derived from multiple outputs, as a proxy for model confidence. However, this coarse-grained metric fails to capture the model’s internal cognitive states, such as confusion, hallucination, or persistent belief in false knowledge. To address this, we propose CogConf (Cognitive Confidence), a cognitively grounded uncertainty signal that extends sampling accuracy by incorporating the semantic diversity of incorrect answers and the model’s abstention behaviors. By shifting the focus from sampling-based to cognition-oriented uncertainty modeling, CogConf offers a more faithful reflection of the model's internal beliefs. Building on this signal, we introduce CogAlign, a simple yet effective alignment framework that explicitly aligns the model’s verbalized confidence with CogConf, thereby producing uncertainty estimates that better reflect the model’s internal cognition. Experimental results on six knowledge-intensive in-domain and out-of-domain QA datasets demonstrate that CogConf robustly characterizes the model's internal uncertainty. Building on this foundation, CogAlign guides the model's expression to significantly enhance the trustworthiness and utility of its uncertainty calibration without compromising its underlying QA capabilities, while also demonstrating strong cross-task generalization and output stability. Offering a new pathway toward building more trustworthy LLMs.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Hybrid Vector-Occupancy Field for Robust Implicit 3D Surface Reconstruction

Yue Wu
Zhigang Gao
Tengfei Xiao
Can Qin
Yongzhe Yuan
Hao Li
Kaiyuan Feng
Wenping Ma

We introduce the Hybrid Vector-Occupancy Field (HVOF), a new implicit 3D representation for reconstructing both open and closed surfaces from sparse point clouds. Existing approaches, such as occupancy field and signed distance fields, face severe limitations. They struggle with open surfaces, while unsigned distance field and neural vector field exhibit directional instability in complex topologies and ridge regions. HVOF addresses these challenges by incorporating a smoothly decaying occupancy field around the surface, while capturing precise local geometry using truncated displacement vectors, naturally mitigating direction-field ambiguities near ridge regions. This unified design forms a robust hybrid representation that leverages both occupancy and vector fields. To fulfill it, we design a Hybrid Field variational autoencoder including a hierarchical cross-attention encoder and dual-branch decoder that jointly learn occupancy and vector fields through continuous weighting. Extensive experiments demonstrate that HVOF consistently outperforms state-of-the-art methods across ShapeNet, ABC, and MGN datasets, accurately reconstructing both open and closed surfaces while preserving fine geometric details in complex regions.

PDF Details DOI

AAAI Conference 2026 Conference Paper

ICL-Router: In-Context Learned Model Representations for LLM Routing

Chenxu Wang
Hao Li
Yiqun Zhang
Linyao Chen
Jianhao Chen
Ping Jian
Qiaosheng Zhang
Shuyue Hu

Large language models (LLMs) often exhibit complementary strengths. Model routing harnesses these strengths by dynamically directing each query to the most suitable model, given a candidate model pool. However, routing performance relies on accurate model representations, and adding new models typically requires retraining, limiting scalability. To address these challenges, we propose a novel routing method using in-context vectors to represent model capabilities. The method proceeds in two stages. First, queries are embedded and projected into vectors, with a projector and LLM-based router trained to reconstruct the original queries, aligning vector representations with the router’s semantic space. Second, each candidate model is profiled on a query set, and the router learns---based on in-context vectors of query and model performance---to predict whether each model can correctly answer new queries. Extensive experiments demonstrate that our method achieves state-of-the-art routing performance in both in-distribution and out-of-distribution tasks. Moreover, our method allows for seamless integration of new models without retraining the router.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Identity-Aware Vision-Language Model for Explainable Face Forgery Detection

Junhao Xu
Jingjing Chen
Yang Jiao
Jiacheng Zhang
Zhiyu Tan
Hao Li
Yu-Gang Jiang

Recent advances in generative artificial intelligence have enabled the creation of highly realistic image forgeries, raising significant concerns about digital media authenticity. While existing detection methods demonstrate promising results on benchmark datasets, they face critical limitations in real-world applications. First, existing detectors typically fail to detect semantic inconsistencies with the person’s identity, such as implausible behaviors or incompatible environmental contexts in given images. Second, these methods rely heavily on low-level visual cues, making them effective for known forgeries but less reliable against new or unseen manipulation techniques. To address these challenges, we present a novel personalized vision-language model (VLM) that integrates low-level visual artifact analysis and high-level semantic inconsistency detection. Unlike previous VLM-based methods, our approach avoids resource-intensive supervised fine-tuning that often struggles to preserve distinct identity characteristics. Instead, we employ a lightweight method that dynamically encodes identity-specific information into specialized identifier tokens. This design enables the model to learn distinct identity characteristics while maintaining robust generalization capabilities. We further enhance detection capabilities through a lightweight detection adapter that extracts fine-grained information from shallow features of the vision encoder, preserving critical low-level evidence. Comprehensive experiments demonstrate that our approach achieves 94.25% accuracy and 94.08% F1 score, outperforming both traditional forgery detectors and general VLMs while requiring only 10 extra tokens.

PDF Details DOI

YNICL Journal 2026 Journal Article

Lesion locations are associated with cognitive impairment after ischemic stroke in young adults

Mijntje M.I. Schellekens
Hao Li
Maartje Wijnands
Anastasia Papounidou
Esther M. Boot
Jamie I. Verhoeven
Merel S. Ekker
Mayte E. van Alebeek

Details DOI

AAAI Conference 2026 Conference Paper

MetaDiT: Enabling Fine-grained Constraints in High-degree-of Freedom Metasurface Design

Hao Li
Andrey Bogdanov

Metasurfaces are ultrathin, engineered materials composed of nanostructures that manipulate light in ways unattainable by natural materials. Recent advances have leveraged computational optimization, machine learning, and deep learning to automate their design. However, existing approaches exhibit two fundamental limitations: (1) they often restrict the model to generating only a subset of design parameters, and (2) they rely on heavily downsampled spectral targets, which compromises both the novelty and accuracy of the resulting structures. The core challenge lies in developing a generative model capable of exploring a large, unconstrained design space while precisely capturing the intricate physical relationships between material parameters and their high-resolution spectral responses. In this paper, we introduce MetaDiT, a novel framework for high-fidelity metasurface design that addresses these limitations. Our approach leverages a robust spectrum encoder pretrained with contrastive learning, providing strong conditional guidance to a Diffusion Transformer-based backbone. Experiments demonstrate that MetaDiT outperforms existing baselines in spectral accuracy, we further validate our method through extensive ablation studies.

PDF Details DOI

JBHI Journal 2026 Journal Article

Multimodal Integration of A Novel Gait State Time Interval Signal Generation Method and Insole Sensor Data-based Body Intelligence: Application in Parkinson's Disease

Hao Li
Illa Baryskievic
Anatoliy Baryskievic
Viktar Tsviatkou

Multidomain and multimodal identification of the walking gait-cycle states is important for detecting and monitoring locomotion disorders such as Parkinson's disease (PD). We propose a novel multizonal clustering and multi-level thresholding method based on analyzing multizonal plantar load distribution for generating a discrete gait-state time-interval (GSTI) signal to improve PD diagnosis accuracy and the effectiveness of rehabilitation through personalized strategies. Multidomain analysis of the GSTI signal shows a novel coupled I. Baryskievic-H. Li bio-oscillator interpreted as a GSTI-derived signal-level oscillatory signature that may be associated with a central nervous system (CNS)-related locomotor rhythm organization. The bio-oscillator consists of two interconnected oscillations with distinct resonant spectral peaks at specific natural frequencies and phase coupling (nonlinearity) between two frequency components. We propose a multidomain feature level of layered Integrative Body Intelligence (IBI) framework to identify lower and higher-order interactions between gait cycle states. The proposed multimodal data level of IBI involves the proposed acoustic and visual biofeedback based on a novel acoustic harmonic plantar pressure model and a 3D gait state portrait of the GSTI signal used for walking gait monitoring and personalized rehabilitation assessment in PD. Experiments on a publicly available PD plantar-insole dataset show that the Multilayer Perceptron (MLP) model based on the selected multidomain (time-interval, spectral, and bispectral) feature subset achieves classification accuracy (94. 44%), and offers a trade-off between model complexity and performance for PD recognition. This result suggests that it is possible to accurately diagnose early-stage PD through merely testing patients' GSTI signal.

Details DOI

YNIMG Journal 2026 Journal Article

Oxygen dependency of cognition: Neural mechanisms of reversible cognitive changes in Tibetan highlanders during altitudinal migration

Xiaoyan Li
Hao Li
Yaping Zeng
Dacheng Ren
Hailin Ma

Details DOI

AAAI Conference 2026 Conference Paper

The Avengers: A Routing Recipe for Collective Intelligence in Language Models

Yiqun Zhang
Hao Li
Chenxu Wang
Linyao Chen
Qiaosheng Zhang
Peng Ye
Shi Feng
Xinrun Wang

Proprietary models are increasingly dominating the race for ever-larger language models. Can open-source, smaller models remain competitive across a broad range of tasks? In this paper, we present the Avengers---a lightweight framework that leverages the collective intelligence of these smaller models. The Avengers builds upon four lightweight operations: (i) embedding: encode queries using a text embedding model; (ii) clustering: group queries based on their semantic similarity; (iii) scoring: scores each model's performance within each cluster; and (iv) voting: improve outputs via repeated sampling and voting. At inference time, each query is embedded and assigned to its nearest cluster. The top-performing model(s) within that cluster are selected to generate the response with repeated sampling. Remarkably, with 10 open-source models (~7B parameters each), the Avengers surpasses GPT-4o, 4.1, and 4.5 in average performance across 15 diverse datasets spanning mathematics, coding, logical reasoning, general knowledge, and affective tasks. In particular, it surpasses GPT-4.1 on mathematics tasks by 18.21% and on code tasks by 7.46%. Furthermore, the Avengers delivers superior out-of-distribution generalization, and remains robust across various embedding models, clustering algorithms, ensemble strategies, data efficiency, and values of its sole parameter---the number of clusters.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling

Hao Li
Shuai Yang
Yilun Chen
Xinyi Chen
Xiaoda Yang
Yang Tian
Hanqing Wang
Tai WANG

Recent vision-language-action (VLA) models built on pretrained vision-language models (VLMs) have demonstrated strong performance in robotic manipulation. However, these models remain constrained by the single-frame image paradigm and fail to fully leverage the temporal information offered by multi-frame histories, as directly feeding multiple frames into VLM backbones incurs substantial computational overhead and inference latency. We propose CronusVLA, a unified framework that extends single-frame VLA models to the multi-frame paradigm. CronusVLA follows a two-stage process: (1) Single-frame pretraining on large-scale embodied datasets with autoregressive prediction of action tokens, establishing an effective embodied vision-language foundation; (2) Multi-frame post-training, which adapts the prediction of the vision-language backbone from discrete tokens to learnable features, and aggregates historical information via feature chunking. CronusVLA effectively addresses the existing challenges of multi-frame modeling while enhancing performance. To evaluate the robustness under temporal and spatial disturbances, we introduce SimplerEnv-OR, a novel benchmark featuring 24 types of observational disturbances and 120 severity levels. Experiments across three embodiments in simulated and real-world environments demonstrate that CronusVLA achieves leading performance and superior robustness, with a 70.9% success rate on SimplerEnv, a 26.8% improvement over OpenVLA on LIBERO, and the highest robustness score on SimplerEnv-OR, showing the promise of efficient multi-frame adaptation for real-world VLA deployment.

PDF Details DOI

EAAI Journal 2025 Journal Article

A review of dynamic flexible regulation strategies for multi-energy coupled steelmaking-continuous casting production

Liangliang Sun
Hao Li
Zhenghao Song
Ge Guo
Changyu Wang
Natalia M. Matsveichuk
Yuri N. Sotskov

Details DOI

AAAI Conference 2025 Conference Paper

AdvDisplay: Adversarial Display Assembled by Thermoelectric Cooler for Fooling Thermal Infrared Detectors

Hao Li
Fanggao Wan
Yue Su
Yue Wu
Mingyang Zhang
Maoguo Gong

When the current physical adversarial patches cannot deceive thermal infrared detectors, the existing techniques implement adversarial attacks from scratch, such as digital patch generation, material production, and physical deployment. Besides, it is difficult to finely regulate infrared radiation. To address these issues, this paper designs an adversarial thermal display (AdvDisplay ) by assembling thermoelectric coolers (TECs) as an array. Specifically, to reduce the gap between patches in the physical and digital worlds and decrease the power of AdvDisplay device, heat transfer loss and electric power loss are designed to guide the patch optimization. In addition, a precise temperature control scheme for AdvDisplay is proposed based on proportional-integral-derivative (PID) control. Due to the accurate temperature regulation and the reusability of AdvDisplay, our method is able to improve the attack success rate and the efficiency of physical deployments. Extensive experimental results indicate that the proposed method possesses superior adversarial effectiveness compared to other methods and demonstrates strong robustness in physical attacks.

PDF Details DOI

JBHI Journal 2025 Journal Article

Cognitive Load Prediction From Multimodal Physiological Signals Using Multiview Learning

Yingxin Liu
Yang Yu
Hong Tao
Zeqi Ye
Si Wang
Hao Li
Dewen Hu
Zongtan Zhou

Predicting cognitive load is a crucial issue in the emerging field of human-computer interaction and holds significant practical value, particularly in flight scenarios. Although previous studies have realized efficient cognitive load classification, new research is still needed to adapt the current state-of-the-art multimodal fusion methods. Here, we proposed a feature selection framework based on multiview learning to address the challenges of information redundancy and reveal the common physiological mechanisms underlying cognitive load. Specifically, the multimodal signal features [electroencephalogram (EEG), electrodermal activity (EDA), electrocardiogram (ECG), electrooculogram (EOG), & eye movements] at three cognitive load levels were estimated during multiattribute task battery (MATB) tasks performed by 22 healthy participants and fed into a feature selection-multiview classification with cohesion and diversity (FS-MCCD) framework. The optimized feature set was extracted from the original feature set by integrating the weight of each view and the feature weights to formulate the ranking criteria. The cognitive load prediction model, evaluated using real-time classification results, achieved an average accuracy of 81. 08% and an average F1-score of 80. 94% for three-class classification among 22 participants. Furthermore, the weights of the physiological signal features revealed the physiological mechanisms related to cognitive load. Specifically, heightened cognitive load was linked to amplified $\delta$ and $\theta$ power in the frontal lobe, reduced $\alpha$ power in the parietal lobe, and an increase in pupil diameter. Thus, the proposed multimodal feature fusion framework emphasizes the effectiveness and efficiency of using these features to predict cognitive load.

Details DOI

AAAI Conference 2025 Conference Paper

Deconfound Semantic Shift and Incompleteness in Incremental Few-shot Semantic Segmentation

Yirui Wu
Yuhang Xia
Hao Li
Lixin Yuan
Junyang Chen
Jun Liu
Tong Lu
Shaohua Wan

Incremental few-shot semantic segmentation (IFSS) expands segmentation capacity of the trained model to segment new-class images with few samples. However, semantic meanings may shift from background to object class or vice versa during incremental learning. Moreover, new-class samples often lack representative attribute features when the new class greatly differs from the pre-learned old class. In this paper, we propose a causal framework to discuss the cause of semantic shift and incompleteness in IFSS, and we deconfound the revealed causal effects from two aspects. First, we propose a Causal Intervention Module (CIM) to resist semantic shift. CIM progressively and adaptively updates prototypes of old class, and removes the confounder in an intervention manner. Second, a Prototype Refinement Module (PRM) is proposed to complete the missing semantics. In PRM, knowledge gained from the episode learning scheme assists in fusing features of new-class and old-class prototypes. Experiments on both PASCAL-VOC 2012 and ADE20k benchmarks demonstrate the outstanding performance of our method.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents

Hao Li
Xiaogeng Liu
CHIU Chun
Dianqi Li
Ning Zhang
Chaowei Xiao

Large Language Models (LLMs) are increasingly central to agentic systems due to their strong reasoning and planning capabilities. By interacting with external environments through predefined tools, these agents can carry out complex user tasks. Nonetheless, this interaction also introduces the risk of prompt injection attacks, where malicious inputs from external sources can mislead the agent’s behavior, potentially resulting in economic loss, privacy leakage, or system compromise. System-level defenses have recently shown promise by enforcing static or predefined policies, but they still face two key challenges: the ability to dynamically update security rules and the need for memory stream isolation. To address these challenges, we propose DRIFT, a Dynamic Rule-based Isolation Framework for Trustworthy agentic systems, which enforces both control- and data-level constraints. A Secure Planner first constructs a minimal function trajectory and a JSON-schema-style parameter checklist for each function node based on the user query. A Dynamic Validator then monitors deviations from the original plan, assessing whether changes comply with privilege limitations and the user's intent. Finally, an \textit{Injection Isolator} detects and masks any instructions that may conflict with the user query from the memory stream to mitigate long-term risks. We empirically validate the effectiveness of DRIFT on the AgentDojo and ASB benchmark, demonstrating its strong security performance while maintaining high utility across diverse models—showcasing both its robustness and adaptability. The code is released at https: //github. com/SaFoLab-WISC/DRIFT.