Arrow Research search

Author name cluster

Yufei Han

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers
1 author row

Possible papers

8

AAAI Conference 2026 Conference Paper

Persistent Backdoor Attacks Under Continual Fine-Tuning of LLMs

  • Jing Cui
  • Yufei Han
  • Jianbin Jiao
  • Junge Zhang

Backdoor attacks embed malicious behaviors into Large Language Models (LLMs), enabling adversaries to trigger harmful outputs or bypass safety controls. However, the persistence of the implanted backdoors under user-driven post-deployment continual fine-tuning has been rarely examined. Most prior works evaluate the effectiveness and generalization of implanted backdoors only at releasing and empirical evidence shows that naively injected backdoor persistence degrades after updates. In this work, we study whether and how implanted backdoors persist through a multi‑stage post-deployment fine‑tuning. We propose P‑Trojan, a trigger‑based attack algorithm that explicitly optimizes for backdoor persistence across repeated updates. By aligning poisoned gradients with those of clean tasks on token embeddings, the implanted backdoor mapping is less likely to be suppressed or forgotten during subsequent updates. Theoretical analysis shows the feasibility of such persistent backdoor attacks after continual fine-tuning. And experiments conducted on the Qwen2.5 and LLaMA3 families of LLMs, as well as diverse task sequences, demonstrate that P‑Trojan achieves over \textbf{99\%} persistence while preserving clean‑task accuracy. Our findings highlight the need for persistence-aware evaluation and stronger defenses in realistic model adaptation pipelines.

AAAI Conference 2024 Conference Paper

BadRL: Sparse Targeted Backdoor Attack against Reinforcement Learning

  • Jing Cui
  • Yufei Han
  • Yuzhe Ma
  • Jianbin Jiao
  • Junge Zhang

Backdoor attacks in reinforcement learning (RL) have previously employed intense attack strategies to ensure attack success. However, these methods suffer from high attack costs and increased detectability. In this work, we propose a novel approach, BadRL, which focuses on conducting highly sparse backdoor poisoning efforts during training and testing while maintaining successful attacks. Our algorithm, BadRL, strategically chooses state observations with high attack values to inject triggers during training and testing, thereby reducing the chances of detection. In contrast to the previous methods that utilize sample-agnostic trigger patterns, BadRL dynamically generates distinct trigger patterns based on targeted state observations, thereby enhancing its effectiveness. Theoretical analysis shows that the targeted backdoor attack is always viable and remains stealthy under specific assumptions. Empirical results on various classic RL tasks illustrate that BadRL can substantially degrade the performance of a victim agent with minimal poisoning efforts (0.003% of total training steps) during training and infrequent attacks during testing. Code is available at: https://github.com/7777777cc/code.

AAAI Conference 2023 Conference Paper

Poisoning with Cerberus: Stealthy and Colluded Backdoor Attack against Federated Learning

  • Xiaoting Lyu
  • Yufei Han
  • Wei Wang
  • Jingkai Liu
  • Bin Wang
  • Jiqiang Liu
  • Xiangliang Zhang

Are Federated Learning (FL) systems free from backdoor poisoning with the arsenal of various defense strategies deployed? This is an intriguing problem with significant practical implications regarding the utility of FL services. Despite the recent flourish of poisoning-resilient FL methods, our study shows that carefully tuning the collusion between malicious participants can minimize the trigger-induced bias of the poisoned local model from the poison-free one, which plays the key role in delivering stealthy backdoor attacks and circumventing a wide spectrum of state-of-the-art defense methods in FL. In our work, we instantiate the attack strategy by proposing a distributed backdoor attack method, namely Cerberus Poisoning (CerP). It jointly tunes the backdoor trigger and controls the poisoned model changes on each malicious participant to achieve a stealthy yet successful backdoor attack against a wide spectrum of defensive mechanisms of federated learning techniques. Our extensive study on 3 large-scale benchmark datasets and 13 mainstream defensive mechanisms confirms that Cerberus Poisoning raises a significantly severe threat to the integrity and security of federated learning practices, regardless of the flourish of robust Federated Learning methods.

AAAI Conference 2023 Conference Paper

Towards Efficient and Domain-Agnostic Evasion Attack with High-Dimensional Categorical Inputs

  • Hongyan Bao
  • Yufei Han
  • Yujun Zhou
  • Xin Gao
  • Xiangliang Zhang

Our work targets at searching feasible adversarial perturbation to attack a classifier with high-dimensional categorical inputs in a domain-agnostic setting. This is intrinsically a NP-hard knapsack problem where the exploration space becomes explosively larger as the feature dimension increases. Without the help of domain knowledge, solving this problem via heuristic method, such as Branch-and-Bound, suffers from exponential complexity, yet can bring arbitrarily bad attack results. We address the challenge via the lens of multi-armed bandit based combinatorial search. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our objective is to achieve highly efficient and effective attack using an Orthogonal Matching Pursuit (OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our theoretical analysis bounding the regret gap of FEAT guarantees its practical attack performance. In empirical analysis, we compare FEAT with other state-of-the-art domain-agnostic attack methods over various real-world categorical data sets of different applications. Substantial experimental observations confirm the expected efficiency and attack effectiveness of FEAT applied in different application scenarios. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.

AAAI Conference 2021 Conference Paper

Characterizing the Evasion Attackability of Multi-label Classifiers

  • Zhuo Yang
  • Yufei Han
  • Xiangliang Zhang

Evasion attack in multi-label learning systems is an interesting, widely witnessed, yet rarely explored research topic. Characterizing the crucial factors determining the attackability of the multi-label adversarial threat is the key to interpret the origin of the adversarial vulnerability and to understand how to mitigate it. Our study is inspired by the theory of adversarial risk bound. We associate the attackability of a targeted multi-label classifier with the regularity of the classifier and the training data distribution. Beyond the theoretical attackability analysis, we further propose an efficient empirical attackability estimator via greedy label space exploration. It provides provably computational efficiency and approximation accuracy. Substantial experimental results on real-world datasets validate the unveiled attackability factors and the effectiveness of the proposed empirical attackability indicator.

NeurIPS Conference 2021 Conference Paper

Learning to dehaze with polarization

  • Chu Zhou
  • Minggui Teng
  • Yufei Han
  • Chao Xu
  • Boxin Shi

Haze, a common kind of bad weather caused by atmospheric scattering, decreases the visibility of scenes and degenerates the performance of computer vision algorithms. Single-image dehazing methods have shown their effectiveness in a large variety of scenes, however, they are based on handcrafted priors or learned features, which do not generalize well to real-world images. Polarization information can be used to relieve its ill-posedness, however, real-world images are still challenging since existing polarization-based methods usually assume that the transmitted light is not significantly polarized, and they require specific clues to estimate necessary physical parameters. In this paper, we propose a generalized physical formation model of hazy images and a robust polarization-based dehazing pipeline without the above assumption or requirement, along with a neural network tailored to the pipeline. Experimental results show that our approach achieves state-of-the-art performance on both synthetic data and real-world hazy images.

AAAI Conference 2020 Conference Paper

Robust Federated Learning via Collaborative Machine Teaching

  • Yufei Han
  • Xiangliang Zhang

For federated learning systems deployed in the wild, data flaws hosted on local agents are widely witnessed. On one hand, given a large amount (e. g. over 60%) of training data are corrupted by systematic sensor noise and environmental perturbations, the performances of federated model training can be degraded significantly. On the other hand, it is prohibitively expensive for either clients or service providers to set up manual sanitary checks to verify the quality of data instances. In our study, we echo this challenge by proposing a collaborative and privacy-preserving machine teaching method. Specifically, we use a few trusted instances provided by teachers as benign examples in the teaching process. Our collaborative teaching approach seeks jointly the optimal tuning on the distributed training set, such that the model learned from the tuned training set predicts labels of the trusted items correctly. The proposed method couples the process of teaching and learning and thus produces directly a robust prediction model despite the extremely pervasive systematic data corruption. The experimental study on real benchmark data sets demonstrates the validity of our method.

IJCAI Conference 2016 Conference Paper

Partially Supervised Graph Embedding for Positive Unlabelled Feature Selection

  • Yufei Han
  • Yun Shen

Selecting discriminative features in positive unlabelled (PU) learning tasks is a challenging problem due to lack of negative class information. Traditional supervised and semi-supervised feature selection methods are not able to be applied directly in this scenario, and unsupervised feature selection algorithms are designed to handle unlabelled data while neglecting the available information from positive class. To leverage the partially observed positive class information, we propose to encode the weakly supervised information in PU learning tasks into pairwise constraints between training instances. Violation of pairwise constraints are measured and incorporated into a partially supervised graph embedding model. Extensive experiments on different benchmark databases and a real-world cyber security application demonstrate the effectiveness of our algorithm.