Arrow Research search

Author name cluster

Haozhao Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

AAAI Conference 2026 Conference Paper

Causality-inspired Federated Learning for Dynamic Spatio-Temporal Graphs

  • Yuxuan Liu
  • Wenchao Xu
  • Haozhao Wang
  • Zhiming He
  • Zhaofeng Shi
  • Chongyang Xu
  • Peichao Wang
  • Boyuan Zhang

Federated Graph Learning (FGL) has emerged as a powerful paradigm for decentralized training of graph neural networks while preserving data privacy. However, existing FGL methods are predominantly designed for static graphs and rely on parameter averaging or distribution alignment, which implicitly assume that all features are equally transferable across clients, overlooking both the spatial and temporal heterogeneity and the presence of client-specific knowledge in real-world graphs. In this work, we identify that such assumptions create a vicious cycle of spurious representation entanglement, client-specific interference, and negative transfer, degrading generalization performance in Federated Learning over Dynamic Spatio-Temporal Graphs (FSTG). To address this issue, we propose a novel causality-inspired framework named SC-FSGL, which explicitly decouples transferable causal knowledge from client-specific noise through representation-level interventions. Specifically, we introduce a Conditional Separation Module that simulates soft interventions through client conditioned masks, enabling the disentanglement of invariant spatio-temporal causal factors from spurious signals and mitigating representation entanglement caused by client heterogeneity. In addition, we propose a Causal Codebook that clusters causal prototypes and aligns local representations via contrastive learning, promoting cross-client consistency and facilitating knowledge sharing across diverse spatio-temporal patterns. Experiments on five diverse heterogeneity Spatio-Temporal Graph (STG) datasets show that SC-FSGL outperforms state-of-the-art methods.

AAAI Conference 2026 Conference Paper

Data-Centric Sequential Recommendation with Relation-Augmented Generation

  • Yichen Li
  • Yichen Tan
  • Yijing Shan
  • Haozhao Wang
  • Rui Zhang
  • Imran Razzak
  • Ruixuan Li

Data-Centric Sequential Recommendation (DaCSR) has emerged as a promising technique that enhances dataset quality to better capture user preferences without increasing training complexity. However, mining item relations to improve data quality remains challenging due to the intricate nature of interaction sequences. Existing methods predominantly either: 1) optimize models to learn such item relations from fixed datasets at significant training cost, or 2) employ generative models to adaptively learn only interaction patterns, which lack interpretability and cannot guarantee effective data quality enhancement. In this paper, we pioneer a relation-guided dataset augmentation and regeneration framework for sequential recommendation called \textbf{RaSR}. This framework can significantly improve model performance on original datasets while maintaining training efficiency without modifying the model architecture. Specifically, we first preprocess user interactions to construct standardized sequential data and extract semantic representations via a Large Language Model (LLM). We then build a multi-relation graph with manually predefined metrics and semantic representations to generate augmented datasets. Finally, a relation-aware generator can produce regenerated datasets with both the multi-relation graph and the augmented dataset. To verify the effectiveness of RaSR, we conduct experiments on various backbone models and datasets, and achieve significant performance improvement compared to training the model only on the original dataset.

AAAI Conference 2026 Conference Paper

FedCD: Towards Consolidated Distillation for Heterogeneous Federated Learning

  • Yichen Li
  • Hang Su
  • Huifa Li
  • Haolin Yang
  • Xinlin Zhuang
  • Haochen Xue
  • Haozhao Wang
  • Imran Razzak

Knowledge Distillation (KD) serves as an effective approach to addressing heterogeneity issues in Federated Learning (FL), leveraging additional datasets to align local and global models better. There are two primary distillation paradigms: feature-based distillation, which utilizes intermediate-layer features of the network, and logit-based distillation, which employs the final layer's logit outputs. However, existing studies often select distillation methods based on intuitive and empirical evidence when facing different heterogeneous settings, neglecting the intrinsic relationship between distillation paradigms and heterogeneity. This oversight may result in suboptimal federated knowledge distillation performance under heterogeneous conditions. In this paper, we propose the Consolidated Distillation for Heterogeneous Federated Learning - FedCD that balances knowledge representations from both feature-based and logit-based distillation to enhance performance. Specifically, to address the misalignment between knowledge conveyed by features and logits, we aggregate features from different layers via cross-layer attention to preserve semantic knowledge, followed by distribution modeling using Gaussian Mixture Models. This process strengthens knowledge distillation by constraining the transformation of different network layers' features under a consolidated distribution, thereby mitigating impacts from both data and model heterogeneity. Extensive experiments demonstrate that FedCD outperforms state-of-the-art methods by over 10.72% and validate the effectiveness of our approach.

AAAI Conference 2026 Conference Paper

Ripple Shapley: Data Influence Attribution in One Federated Training Run

  • Dewen Zeng
  • Wenlong Tian
  • Haozhao Wang
  • Jianfeng Lu
  • Weijun Xiao
  • Zhiyong Xu

Contribution evaluation is essential for incentivizing high-quality data sharing in federated learning (FL), yet existing Shapley-value-based methods are prohibitively expensive and overlook temporal influence propagation. In this paper, we propose Ripple Shapley, a novel attribution framework that enables accurate, real-time data valuation within a single federated training run. Our method decomposes each sample’s impact into an instantaneous drop term and a recursive ripple term, the latter capturing downstream influence via a Jacobian chain over global updates. To scale computation, we introduce a low-rank approximation of the Jacobian product and construct a shared subspace for efficient ripple accumulation. Extensive experiments on CIFAR-10 and MNIST show that Ripple Shapley achieves up to 62× speedup over existing Shapley-based FL methods while maintaining high attribution fidelity, significantly improving efficiency, robustness, and fairness in federated environments. We further demonstrate its effectiveness in dynamic federated learning scenarios and its potential for real-time data pricing.

TIST Journal 2026 Journal Article

SNEAK: Synonymous Sentences-Aware Adversarial Attack on Natural Language Video Localization

  • Wen Shi
  • Wenchao Xu
  • Haozhao Wang
  • Xingshuo Han
  • Pan Zhou
  • Ruixuan Li

Natural language video localization (NLVL) is an important task in the vision-language understanding area, which calls for an in-depth understanding of not only computer vision and natural language side alone, but more importantly the interplay between both sides. Adversarial attack has been well-recognized as a critical security issue of deep neural network models, which requires prudent investigation. Despite its extensive yet separated studies in video and language two kinds of unimodal tasks, current understanding of the adversarial attack in NLVL tasks including both the vision and language modality is less developed. This paper therefore aims to comprehensively investigate the adversarial attacks of NLVL models by examining three different facets of adversarial attack methods. To achieve the attack goal, we propose a new adversarial attack paradigm called synonymous sentences-aware adversarial attack on NLVL (SNEAK), which captures the cross-modality interplay between the vision and language sides. To further enhance the stealthiness of SNEAK, we propose a frame importance-guided pruning with SNEAK attack (PSA) mechanism to reduce the amount of perturbed frames. Extensive experiments on five NLVL models and two datasets demonstrate the effectiveness and stealthiness of the proposed attack methods.

AAAI Conference 2026 Conference Paper

Unbiased Rectification for Sequential Recommender Systems Under Fake Orders

  • Qiyu Qin
  • Yichen Li
  • Haozhao Wang
  • Cheng Wang
  • Rui Zhang
  • Ruixuan Li

Fake orders pose increasing threats to sequential recommender systems by misleading recommendation results through artificially manipulated interactions, including click farming, context-irrelevant substitutions, and sequential perturbations. Unlike injecting carefully designed fake users to influence recommendation performance, fake orders embedded within genuine user sequences aim to disrupt user preferences and mislead recommendation results, thereby manipulating exposure rates of specific items to gain competitive advantages. To protect users' authentic interest preferences and eliminate misleading information, this paper aims to perform precise and efficient rectification on compromised sequential recommender systems while avoiding the enormous computational and time costs of retraining existing models. Specifically, we identify that fake orders are not absolutely harmful—in certain cases, partial fake orders can even have a data augmentation effect. Based on this insight, we propose Dual-view Identification and Targeted Rectification (DITaR), which primarily identifies harmful samples to achieve unbiased rectification of the system. The core idea of this method is to obtain differentiated representations from collaborative and semantic views for precise detection, and then filters detected suspicious fake orders to select truly harmful ones for targeted rectification with gradient ascent. This ensures that useful information in fake orders is not removed while preventing bias residue. Moreover, it maintains the original data volume and sequence structure, thus protecting system performance and trustworthiness to achieve optimal unbiased rectification. Extensive experiments on three datasets demonstrate that DITaR achieves superior performance compared to state-of-the-art methods in terms of recommendation quality, computational efficiency, and system robustness.

ICML Conference 2025 Conference Paper

Adversarial Cooperative Rationalization: The Risk of Spurious Correlations in Even Clean Datasets

  • Wei Liu 0144
  • Zhongyu Niu
  • Lang Gao
  • Zhiying Deng
  • Jun Wang 0018
  • Haozhao Wang
  • Ruixuan Li 0001

This study investigates the self-rationalization framework constructed with a cooperative game, where a generator initially extracts the most informative segment from raw input, and a subsequent predictor utilizes the selected subset for its input. The generator and predictor are trained collaboratively to maximize prediction accuracy. In this paper, we first uncover a potential caveat: such a cooperative game could unintentionally introduce a sampling bias during rationale extraction. Specifically, the generator might inadvertently create an incorrect correlation between the selected rationale candidate and the label, even when they are semantically unrelated in the original dataset. Subsequently, we elucidate the origins of this bias using both detailed theoretical analysis and empirical evidence. Our findings suggest a direction for inspecting these correlations through attacks, based on which we further introduce an instruction to prevent the predictor from learning the correlations. Through experiments on six text classification datasets and two graph classification datasets using three network architectures (GRUs, BERT, and GCN), we show that our method significantly outperforms recent rationalization methods.

NeurIPS Conference 2025 Conference Paper

Beyond Higher Rank: Token-wise Input-Output Projections for Efficient Low-Rank Adaptation

  • Shiwei Li
  • Xiandi Luo
  • Haozhao Wang
  • Xing Tang
  • Ziqiang Cui
  • Dugang Liu
  • Yuhua Li
  • Xiuqiang He

Low-rank adaptation (LoRA) is a parameter-efficient fine-tuning (PEFT) method widely used in large language models (LLMs). LoRA essentially describes the projection of an input space into a low-dimensional output space, with the dimensionality determined by the LoRA rank. In standard LoRA, all input tokens share the same weights and undergo an identical input-output projection. This limits LoRA's ability to capture token-specific information due to the inherent semantic differences among tokens. To address this limitation, we propose **Token-wise Projected Low-Rank Adaptation (TopLoRA)**, which dynamically adjusts LoRA weights according to the input token, thereby learning token-wise input-output projections in an end-to-end manner. Formally, the weights of TopLoRA can be expressed as $B\Sigma_X A$, where $A$ and $B$ are low-rank matrices (as in standard LoRA), and $\Sigma_X$ is a diagonal matrix generated from each input token $X$. Notably, TopLoRA does not increase the rank of LoRA weights but achieves more granular adaptation by learning token-wise LoRA weights (i. e. , token-wise input-output projections). Extensive experiments across multiple models and datasets demonstrate that TopLoRA consistently outperforms LoRA and its variants. The code is available at https: //github. com/Leopold1423/toplora-neurips25.

ICML Conference 2025 Conference Paper

Beyond Zero Initialization: Investigating the Impact of Non-Zero Initialization on LoRA Fine-Tuning Dynamics

  • Shiwei Li 0002
  • Xiandi Luo
  • Xing Tang 0007
  • Haozhao Wang
  • Hao Chen
  • Weihong Luo
  • Yuhua Li 0003
  • Xiuqiang He 0001

Low-rank adaptation (LoRA) is a widely used parameter-efficient fine-tuning method. In standard LoRA layers, one of the matrices, $A$ or $B$, is initialized to zero, ensuring that fine-tuning starts from the pretrained model. However, there is no theoretical support for this practice. In this paper, we investigate the impact of non-zero initialization on LoRA’s fine-tuning dynamics from an infinite-width perspective. Our analysis reveals that, compared to zero initialization, simultaneously initializing $A$ and $B$ to non-zero values improves LoRA’s robustness to suboptimal learning rates, particularly smaller ones. Further analysis indicates that although the non-zero initialization of $AB$ introduces random noise into the pretrained weight, it generally does not affect fine-tuning performance. In other words, fine-tuning does not need to strictly start from the pretrained model. The validity of our findings is confirmed through extensive experiments across various models and datasets. The code is available at https: //github. com/Leopold1423/non_zero_lora-icml25.

ICLR Conference 2025 Conference Paper

Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization

  • Wei Liu 0144
  • Zhiying Deng
  • Zhongyu Niu
  • Jun Wang
  • Haozhao Wang
  • Zhigang Zeng
  • Ruixuan Li 0001

Extracting a small subset of crucial rationales from the full input is a key problem in explainability research. The most widely used fundamental criterion for rationale extraction is the maximum mutual information (MMI) criterion. In this paper, we first demonstrate that MMI suffers from diminishing marginal returns. Once part of the rationale has been identified, finding the remaining portions contributes only marginally to increasing the mutual information, making it difficult to use MMI to locate the rest. In contrast to MMI that aims to reproduce the prediction, we seek to identify the parts of the input that the network can actually utilize. This is achieved by comparing how different rationale candidates match the capability space of the weight matrix. The weight matrix of a neural network is typically low-rank, meaning that the linear combinations of its column vectors can only cover part of the directions in a high-dimensional space (high-dimension: the dimensions of an input vector). If an input is fully utilized by the network, it generally matches these directions (e.g., a portion of a hypersphere), resulting in a representation with a high norm. Conversely, if an input primarily falls outside (orthogonal to) these directions, its representation norm will approach zero, behaving like noise that the network cannot effectively utilize. Building on this, we propose using the norms of rationale candidates as an alternative objective to MMI. Through experiments on four text classification datasets and one graph classification dataset using three network architectures (GRUs, BERT, and GCN), we show that our method outperforms MMI and its improved variants in identifying better rationales. We also compare our method with a representative LLM (llama-3.1-8b-instruct) and find that our simple method gets comparable results to it and can sometimes even outperform it.

ICML Conference 2025 Conference Paper

BSemiFL: Semi-supervised Federated Learning via a Bayesian Approach

  • Haozhao Wang
  • Shengyu Wang
  • Jiaming Li
  • Hao Ren 0001
  • Xingshuo Han
  • Wenchao Xu 0001
  • Shangwei Guo
  • Tianwei Zhang 0004

Semi-supervised Federated Learning (SSFL) is a promising approach that allows clients to collaboratively train a global model in the absence of their local data labels. The key step of SSFL is the re-labeling where each client adopts two types of available models, namely global and local models, to re-label the local data. While various technologies such as using the global model or the average of two models have been proposed to conduct the re-labeling step, little literature delves deeply into the performance dominance and limitations of the two models. In this paper, we first theoretically and empirically demonstrate that the local model achieves higher re-labeling accuracy over local data while the global model can progressively improve the re-labeling performance by introducing the extra data knowledge of other clients. Based on these findings, we propose BSemiFL which re-labels the local data through the collaboration between the local and global model in a Bayesian approach. Specifically, to re-label any given local sample, BSemiFL first uses Bayesian inference to assess the closeness of the local/global model to the sample. Then, it applies a weighted combination of their pseudo labels, using the closeness as the weights. Theoretical analysis shows that the labeling error of our method is smaller than that of simply using the global model, the local model, or their simple average. Experimental results show that BSemiFL improves the performance by up to $9. 8%$ as compared to state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

ChatbotID: Identifying Chatbots with Granger Causality Test

  • Xiaoquan Yi
  • Haozhao Wang
  • Yining Qi
  • Wenchao Xu
  • Rui Zhang
  • Yuhua Li
  • Ruixuan Li

With the increasing sophistication of Large Language Models (LLMs), it is crucial to develop reliable methods to accurately identify whether an interlocutor in real-time dialogue is human or chatbot. However, existing detection methods are primarily designed for analyzing full documents, not the unique dynamics and characteristics of dialogue. These approaches frequently overlook the nuances of interaction that are essential in conversational contexts. This work identifies two key patterns in dialogues: (1) Human-Human (H-H) interactions exhibit significant bidirectional sentiment influence, while (2) Human-Chatbot (H-C) interactions display a clear asymmetric pattern. We propose an innovative approach named ChatbotID, which applies the Granger Causality Test (GCT) to extract a novel set of interactional features that capture the evolving, predictive relationships between conversational attributes. By synergistically fusing these GCT-based interactional features with contextual embeddings, and optimizing the model through a meticulous loss function. Experimental results across multiple datasets and detection models demonstrate the effectiveness of our framework, with significant improvements in accuracy for distinguishing between H-H and H-C dialogues.

IJCAI Conference 2025 Conference Paper

DaringFed: A Dynamic Bayesian Persuasion Pricing for Online Federated Learning Under Two-sided Incomplete Information

  • Yun Xin
  • Jianfeng Lu
  • Shuqin Cao
  • Gang Li
  • Haozhao Wang
  • Guanghui Wen

Online Federated Learning (OFL) is a real-time learning paradigm that sequentially executes parameter aggregation immediately for each random arriving client. To motivate clients to participate in OFL, it is crucial to offer appropriate incentives to offset the training resource consumption. However, the design of incentive mechanisms in OFL is constrained by the dynamic variability of Two-sided Incomplete Information (TII) concerning resources, where the server is unaware of the clients’ dynamically changing computational resources, while clients lack knowledge of the real-time communication resources allocated by the server. To incentivize clients to participate in training by offering dynamic rewards to each arriving client, we design a novel Dynamic Bayesian persuasion pricing for online Federated learning (DaringFed) under TII. Specifically, we begin by formulating the interaction between the server and clients as a dynamic signaling and pricing allocation problem within a Bayesian persuasion game, and then demonstrate the existence of a unique Bayesian persuasion Nash equilibrium. By deriving the optimal design of DaringFed under one-sided incomplete information, we further analyze the approximate optimal design of DaringFed with a specific bound under TII. Finally, extensive evaluation conducted on real datasets demonstrate that DaringFed optimizes accuracy and converges speed by 16. 99%, while experiments with synthetic datasets validate the convergence of estimate unknown values and the effectiveness of DaringFed in improving the server’s utility by up to 12. 6%.

NeurIPS Conference 2025 Conference Paper

Efficient Knowledge Transfer in Federated Recommendation for Joint Venture Ecosystem

  • Yichen Li
  • Yijing Shan
  • Yi Liu
  • Haozhao Wang
  • Cheng Wang
  • Yi Wang
  • Ruixuan Li

The current Federated Recommendation System (FedRS) focuses on personalized recommendation services and assumes clients are personalized IoT devices (e. g. , Mobile phones). In this paper, we deeply dive into new but practical FedRS applications within the joint venture ecosystem. Subsidiaries engage as participants with their users and items. However, in such a situation, merely exchanging item embedding is insufficient, as user bases always exhibit both overlaps and exclusive segments, demonstrating the complexity of user information. Meanwhile, directly uploading user information is a violation of privacy and unacceptable. To tackle the above challenges, we propose an efficient and privacy-enhanced federated recommendation for the joint venture ecosystem (FR-JVE) that each client transfers more common knowledge from other clients with a distilled user's \textit{rating preference} from the local dataset. More specifically, we first transform the local data into a new format and apply model inversion techniques to distill the rating preference with frozen user gradients before the federated training. Then, a bridge function is employed on each client side to align the local rating preference and aggregated global preference in a privacy-friendly manner. Finally, each client matches similar users to make a better prediction for overlapped users. From a theoretical perspective, we analyze how effectively FR-JVE can guarantee user privacy. Empirically, we show that FR-JVE achieves superior performance compared to state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

Enhancing Privacy in Multimodal Federated Learning with Information Theory

  • Tianzhe Xiao
  • Yichen Li
  • Yining Qi
  • Yi Liu
  • Haozhao Wang
  • Yi Wang
  • Ruixuan Li

Multimodal federated learning (MMFL) has gained increasing popularity due to its ability to leverage the correlation between various modalities, meanwhile preserving data privacy for different clients. However, recent studies show that correlation between modalities increase the vulnerability of federated learning against Gradient Inversion Attack (GIA). The complicated situation of MMFL privacy preserving can be summarized as follows: 1) different modality transmits different amounts of information, thus requires various protection strength; 2) correlation between modalities should be taken into account. This paper introduces an information theory perspective to analyze the leaked privacy in process of MMFL, and tries to propose a more reasonable protection method \textbf{Sec-MMFL} based on assessing different information leakage possibilities of each modality by conditional mutual information and adjust the corresponding protection strength. Moreover, we use mutual information to reduce the cross-modality information leakage in MMFL. Experiments have proven that our method can bring more balanced and comprehensive protection at an acceptable cost.

NeurIPS Conference 2025 Conference Paper

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

  • Yichen Li
  • Xiuying Wang
  • Wenchao Xu
  • Haozhao Wang
  • Yining Qi
  • Jiahua Dong
  • Ruixuan Li

Model-Heterogeneous Federated Learning (Hetero-FL) has attracted growing attention for its ability to aggregate knowledge from heterogeneous models while keeping private data locally. To better aggregate knowledge from clients, ensemble distillation, as a widely used and effective technique, is often employed after global aggregation to enhance the performance of the global model. However, simply combining Hetero-FL and ensemble distillation does not always yield promising results and can make the training process unstable. The reason is that existing methods primarily focus on logit distillation, which, while being model-agnostic with softmax predictions, fails to compensate for the knowledge bias arising from heterogeneous models. To tackle this challenge, we propose a stable and efficient Feature Distillation for model-heterogeneous Federated learning, dubbed FedFD, that can incorporate aligned feature information via orthogonal projection to integrate knowledge from heterogeneous models better. Specifically, a new feature-based ensemble federated knowledge distillation paradigm is proposed. The global model on the server needs to maintain a projection layer for each client-side model architecture to align the features separately. Orthogonal techniques are employed to re-parameterize the projection layer to mitigate knowledge bias from heterogeneous models and thus maximize the distilled knowledge. Extensive experiments show that FedFD achieves superior performance compared to state-of-the-art methods.

ICML Conference 2025 Conference Paper

FedSSI: Rehearsal-Free Continual Federated Learning with Synergistic Synaptic Intelligence

  • Yichen Li 0006
  • Yuying Wang
  • Haozhao Wang
  • Yining Qi
  • Tianzhe Xiao
  • Ruixuan Li 0001

Continual Federated Learning (CFL) allows distributed devices to collaboratively learn novel concepts from continuously shifting training data while avoiding knowledge forgetting of previously seen tasks. To tackle this challenge, most current CFL approaches rely on extensive rehearsal of previous data. Despite effectiveness, rehearsal comes at a cost to memory, and it may also violate data privacy. Considering these, we seek to apply regularization techniques to CFL by considering their cost-efficient properties that do not require sample caching or rehearsal. Specifically, we first apply traditional regularization techniques to CFL and observe that existing regularization techniques, especially synaptic intelligence, can achieve promising results under homogeneous data distribution but fail when the data is heterogeneous. Based on this observation, we propose a simple yet effective regularization algorithm for CFL named FedSSI, which tailors the synaptic intelligence for the CFL with heterogeneous data settings. FedSSI can not only reduce computational overhead without rehearsal but also address the data heterogeneity issue. Extensive experiments show that FedSSI achieves superior performance compared to state-of-the-art methods.

NeurIPS Conference 2025 Conference Paper

LLM at Network Edge: A Layer-wise Efficient Federated Fine-tuning Approach

  • Jinglong Shen
  • Nan Cheng
  • Wenchao Xu
  • Haozhao Wang
  • Yifan guo
  • Jiajie Xu

Fine-tuning large language models (LLMs) poses significant computational burdens, especially in federated learning (FL) settings. We introduce Layer-wise Efficient Federated Fine-tuning (LEFF), a novel method designed to enhance the efficiency of FL fine-tuning while preserving model performance and minimizing client-side computational overhead. LEFF strategically selects layers for fine-tuning based on client computational capacity, thereby mitigating the straggler effect prevalent in heterogeneous environments. Furthermore, LEFF incorporates an importance-driven layer sampling mechanism, prioritizing layers with greater influence on model performance. Theoretical analysis demonstrates that LEFF achieves a convergence rate of $\mathcal{O}(1/\sqrt{T})$. Extensive experiments on diverse datasets demonstrate that LEFF attains superior computational efficiency and model performance compared to existing federated fine-tuning methods, particularly under heterogeneous conditions.

NeurIPS Conference 2025 Conference Paper

Resource-Constrained Federated Continual Learning: What Does Matter?

  • Yichen Li
  • Yuying Wang
  • Jiahua Dong
  • Haozhao Wang
  • Yining Qi
  • Rui Zhang
  • Ruixuan Li

Federated Continual Learning (FCL) aims to enable sequential privacy-preserving model training on streams of incoming data that vary in edge devices by preserving previous knowledge while adapting to new data. Current FCL literature focuses on restricted data privacy and access to previously seen data while imposing no constraints on the training overhead. This is unreasonable for FCL applications in real-world scenarios, where edge devices are primarily constrained by resources such as storage, computational budget, and label rate. We revisit this problem with a large-scale benchmark and analyze the performance of state-of-the-art FCL approaches under different resource-constrained settings. Various typical FCL techniques and six datasets in two incremental learning scenarios (Class-IL and Domain-IL) are involved in our experiments. Through extensive experiments amounting to a total of over 1, 000+ GPU hours, we find that, under limited resource-constrained settings, existing FCL approaches, with no exception, fail to achieve the expected performance. Our conclusions are consistent in the sensitivity analysis. This suggests that most existing FCL methods are particularly too resource-dependent for real-world deployment. Moreover, we study the performance of typical FCL techniques with resource constraints and shed light on future research directions in FCL.

ICLR Conference 2025 Conference Paper

Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

  • Fushuo Huo
  • Wenchao Xu 0001
  • Zhong Zhang 0014
  • Haozhao Wang
  • Zhicheng Chen
  • Peilin Zhao

Hallucination remains a significant challenge in Large Vision-Language Models (LVLMs). To alleviate this issue, some methods, known as contrastive decoding, induce hallucinations by manually disturbing the raw vision or instruction inputs and then mitigate them by contrasting the outputs of the original and disturbed LVLMs. However, these holistic input disturbances sometimes induce potential noise and also double the inference cost. To tackle these issues, we propose a simple yet effective method named $\textit{Self-Introspective Decoding}$ (SID). Our empirical investigations reveal that pre-trained LVLMs can introspectively assess the importance of vision tokens based on preceding vision and text (both instruction and generated) tokens. Leveraging this insight, we develop the Context and Text-aware Token Selection (CT$^2$S) strategy, which preserves only the least important vision tokens after the early decoder layers, thereby adaptively amplify vision-and-text association hallucinations during auto-regressive decoding. This strategy ensures that multimodal knowledge absorbed in the early decoder layers induces multimodal contextual rather than aimless hallucinations, and significantly reduces computation burdens. Subsequently, the original token logits subtract the amplified fine-grained hallucinations, effectively alleviating hallucinations without compromising the LVLMs' general ability. Extensive experiments illustrate SID generates less-hallucination and higher-quality texts across various metrics, without much additional computation cost.

ICML Conference 2025 Conference Paper

The Panaceas for Improving Low-Rank Decomposition in Communication-Efficient Federated Learning

  • Shiwei Li 0002
  • Xiandi Luo
  • Haozhao Wang
  • Xing Tang 0007
  • Shijie Xu
  • Weihong Luo
  • Yuhua Li 0003
  • Xiuqiang He 0001

To improve the training efficiency of federated learning (FL), previous research has employed low-rank decomposition techniques to reduce communication overhead. In this paper, we seek to enhance the performance of these low-rank decomposition methods. Specifically, we focus on three key issues related to decomposition in FL: what to decompose, how to decompose, and how to aggregate. Subsequently, we introduce three novel techniques: Model Update Decomposition (MUD), Block-wise Kronecker Decomposition (BKD), and Aggregation-Aware Decomposition (AAD), each targeting a specific issue. These techniques are complementary and can be applied simultaneously to achieve optimal performance. Additionally, we provide a rigorous theoretical analysis to ensure the convergence of the proposed MUD. Extensive experimental results show that our approach achieves faster convergence and superior accuracy compared to relevant baseline methods. The code is available at https: //github. com/Leopold1423/fedmud-icml25.

ECAI Conference 2024 Conference Paper

Adversarial Attack for Explanation Robustness of Rationalization Models

  • Yuankai Zhang 0002
  • Lingxiao Kong
  • Haozhao Wang
  • Ruixuan Li 0001
  • Jun Wang 0018
  • Yuhua Li 0003
  • Wei Liu 0144

Rationalization models, which select a subset of input text as rationale—crucial for humans to understand and trust predictions—have recently emerged as a prominent research area in eXplainable Artificial Intelligence (XAI). However, most of previous studies mainly focus on improving the quality of the rationale, ignoring its robustness to malicious attack. Specifically, whether the rationalization models can still generate high-quality rationale under the adversarial attack remains unknown. To explore this, this paper proposes UAT2E, which aims to undermine the explainability of rationalization models without altering their predictions, thereby eliciting distrust in these models from human users. UAT2E employs the gradient-based search on triggers and then inserts them into the original input to conduct both the non-target and target attack. Experimental results on five datasets reveal the vulnerability of rationalization models in terms of explanation, where they tend to select more meaningless tokens under attacks. Based on this, we make a series of recommendations for improving rationalization models in terms of explanation.

AAAI Conference 2024 Conference Paper

Decoupling Representation and Knowledge for Few-Shot Intent Classification and Slot Filling

  • Jie Han
  • Yixiong Zou
  • Haozhao Wang
  • Jun Wang
  • Wei Liu
  • Yao Wu
  • Tao Zhang
  • Ruixuan Li

Few-shot intent classification and slot filling are important but challenging tasks due to the scarcity of finely labeled data. Therefore, current works first train a model on source domains with sufficiently labeled data, and then transfer the model to target domains where only rarely labeled data is available. However, experience transferring as a whole usually suffers from gaps that exist among source domains and target domains. For instance, transferring domain-specific-knowledge-related experience is difficult. To tackle this problem, we propose a new method that explicitly decouples the transferring of general-semantic-representation-related experience and the domain-specific-knowledge-related experience. Specifically, for domain-specific-knowledge-related experience, we design two modules to capture intent-slot relation and slot-slot relation respectively. Extensive experiments on Snips and FewJoint datasets show that our method achieves state-of-the-art performance. The method improves the joint accuracy metric from 27.72% to 42.20% in the 1-shot setting, and from 46.54% to 60.79% in the 5-shot setting.

IJCAI Conference 2024 Conference Paper

Dual Expert Distillation Network for Generalized Zero-Shot Learning

  • Zhijie Rao
  • Jingcai Guo
  • Xiaocheng Lu
  • Jingming Liang
  • Jie Zhang
  • Haozhao Wang
  • Kang Wei
  • Xiaofeng Cao

Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform mapping function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by introducing a simple yet effective approach, dubbed Dual Expert Distillation Network (DEDN), where two experts are dedicated to coarse- and fine-grained visual-attribute modeling, respectively. Concretely, one coarse expert, namely cExp, has a complete perceptual scope to coordinate visual-attribute similarity metrics across dimensions, and moreover, another fine expert, namely fExp, consists of multiple specialized subnetworks, each corresponds to an exclusive set of attributes. Two experts cooperatively distill from each other to reach a mutual agreement during training. Meanwhile, we further equip DEDN with a newly designed backbone network, i. e. , Dual Attention Network (DAN), which incorporates both region and channel attention information to fully exploit and leverage visual semantic knowledge. Extensive experiments on various benchmark datasets indicate a new state-of-the-art. The code is available at github. com/zjrao/DEDN.

ICML Conference 2024 Conference Paper

FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

  • Shiwei Li 0002
  • Wenchao Xu 0001
  • Haozhao Wang
  • Xing Tang 0007
  • Yining Qi
  • Shijie Xu
  • Weihong Luo
  • Yuhua Li 0003

Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users’ privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize model updates in a post-training manner, resulting in significant approximation errors and consequent degradation in model accuracy. To this end, we propose Federated Binarization-Aware Training (FedBAT), a novel framework that directly learns binary model updates during the local training process, thus inherently reducing the approximation errors. FedBAT incorporates an innovative binarization operator, along with meticulously designed derivatives to facilitate efficient learning. In addition, we establish theoretical guarantees regarding the convergence of FedBAT. Extensive experiments are conducted on four popular datasets. The results show that FedBAT significantly accelerates the convergence and exceeds the accuracy of baselines by up to 9%, even surpassing that of FedAvg in some cases.

ICLR Conference 2024 Conference Paper

FedCDA: Federated Learning with Cross-rounds Divergence-aware Aggregation

  • Haozhao Wang
  • Haoran Xu
  • Yichen Li 0006
  • Yuan Xu
  • Ruixuan Li 0001
  • Tianwei Zhang 0004

In Federated Learning (FL), model aggregation is pivotal. It involves a global server iteratively aggregating client local trained models in successive rounds without accessing private data. Traditional methods typically aggregate the local models from the current round alone. However, due to the statistical heterogeneity across clients, the local models from different clients may be greatly diverse, making the obtained global model incapable of maintaining the specific knowledge of each local model. In this paper, we introduce a novel method, FedCDA, which selectively aggregates cross-round local models, decreasing discrepancies between the global model and local models. The principle behind FedCDA is that due to the different global model parameters received in different rounds and the non-convexity of deep neural networks, the local models from each client may converge to different local optima across rounds. Therefore, for each client, we select a local model from its several recent local models obtained in multiple rounds, where the local model is selected by minimizing its divergence from the local models of other clients. This ensures the aggregated global model remains close to all selected local models to maintain their data knowledge. Extensive experiments conducted on various models and datasets reveal our approach outperforms state-of-the-art aggregation methods.

NeurIPS Conference 2024 Conference Paper

Is the MMI Criterion Necessary for Interpretability? Degenerating Non-causal Features to Plain Noise for Self-Rationalization

  • Wei Liu
  • Zhiying Deng
  • Zhongyu Niu
  • Jun Wang
  • Haozhao Wang
  • YuanKai Zhang
  • Ruixuan Li

An important line of research in the field of explainability is to extract a small subset of crucial rationales from the full input. The most widely used criterion for rationale extraction is the maximum mutual information (MMI) criterion. However, in certain datasets, there are spurious features non-causally correlated with the label and also get high mutual information, complicating the loss landscape of MMI. Although some penalty-based methods have been developed to penalize the spurious features (e. g. , invariance penalty, intervention penalty, etc) to help MMI work better, these are merely remedial measures. In the optimization objectives of these methods, spurious features are still distinguished from plain noise, which hinders the discovery of causal rationales. This paper aims to develop a new criterion that treats spurious features as plain noise, allowing the model to work on datasets rich in spurious features as if it were working on clean datasets, thereby making rationale extraction easier. We theoretically observe that removing either plain noise or spurious features from the input does not alter the conditional distribution of the remaining components relative to the task label. However, significant changes in the conditional distribution occur only when causal features are eliminated. Based on this discovery, the paper proposes a criterion for \textbf{M}aximizing the \textbf{R}emaining \textbf{D}iscrepancy (MRD). Experiments on six widely used datasets show that our MRD criterion improves rationale quality (measured by the overlap with human-annotated rationales) by up to $10. 4\%$ as compared to several recent competitive MMI variants. Code: \url{https: //github. com/jugechengzi/Rationalization-MRD}.

AAAI Conference 2024 Conference Paper

Knowledge-Aware Parameter Coaching for Personalized Federated Learning

  • Mingjian Zhi
  • Yuanguo Bi
  • Wenchao Xu
  • Haozhao Wang
  • Tianao Xiang

Personalized Federated Learning (pFL) can effectively exploit the non-IID data from distributed clients by customizing personalized models. Existing pFL methods either simply take the local model as a whole for aggregation or require significant training overhead to induce the inter-client personalized weights, and thus clients cannot efficiently exploit the mutually relevant knowledge from each other. In this paper, we propose a knowledge-aware parameter coaching scheme where each client can swiftly and granularly refer to parameters of other clients to guide the local training, whereby accurate personalized client models can be efficiently produced without contradictory knowledge. Specifically, a novel regularizer is designed to conduct layer-wise parameters coaching via a relation cube, which is constructed based on the knowledge represented by the layered parameters among all clients. Then, we develop an optimization method to update the relation cube and the parameters of each client. It is theoretically demonstrated that the convergence of the proposed method can be guaranteed under both convex and non-convex settings. Extensive experiments are conducted over various datasets, which show that the proposed method can achieve better performance compared with the state-of-the-art baselines in terms of accuracy and convergence speed.

AAAI Conference 2024 Conference Paper

Non-exemplar Online Class-Incremental Continual Learning via Dual-Prototype Self-Augment and Refinement

  • Fushuo Huo
  • Wenchao Xu
  • Jingcai Guo
  • Haozhao Wang
  • Yunfeng Fan

This paper investigates a new, practical, but challenging problem named Non-exemplar Online Class-incremental continual Learning (NO-CL), which aims to preserve the discernibility of base classes without buffering data examples and efficiently learn novel classes continuously in a single-pass (i.e., online) data stream. The challenges of this task are mainly two-fold: (1) Both base and novel classes suffer from severe catastrophic forgetting as no previous samples are available for replay. (2) As the online data can only be observed once, there is no way to fully re-train the whole model, e.g., re-calibrate the decision boundaries via prototype alignment or feature distillation. In this paper, we propose a novel Dual-prototype Self-augment and Refinement method (DSR) for NO-CL problem, which consists of two strategies: 1) Dual class prototypes: vanilla and high-dimensional prototypes are exploited to utilize the pre-trained information and obtain robust quasi-orthogonal representations rather than example buffers for both privacy preservation and memory reduction. 2) Self-augment and refinement: Instead of updating the whole network, we optimize high-dimensional prototypes alternatively with the extra projection module based on self-augment vanilla prototypes, through a bi-level optimization problem. Extensive experiments demonstrate the effectiveness and superiority of the proposed DSR in NO-CL.

AAAI Conference 2024 Conference Paper

ProCC: Progressive Cross-Primitive Compatibility for Open-World Compositional Zero-Shot Learning

  • Fushuo Huo
  • Wenchao Xu
  • Song Guo
  • Jingcai Guo
  • Haozhao Wang
  • Ziming Liu
  • Xiaocheng Lu

Open-World Compositional Zero-shot Learning (OW-CZSL) aims to recognize novel compositions of state and object primitives in images with no priors on the compositional space, which induces a tremendously large output space containing all possible state-object compositions. Existing works either learn the joint compositional state-object embedding or predict simple primitives with separate classifiers. However, the former method heavily relies on external word embedding methods, and the latter ignores the interactions of interdependent primitives, respectively. In this paper, we revisit the primitive prediction approach and propose a novel method, termed Progressive Cross-primitive Compatibility (ProCC), to mimic the human learning process for OW-CZSL tasks. Specifically, the cross-primitive compatibility module explicitly learns to model the interactions of state and object features with the trainable memory units, which efficiently acquires cross-primitive visual attention to reason high-feasibility compositions, without the aid of external knowledge. Moreover, to alleviate the invalid cross-primitive interactions, especially for partial-supervision conditions (pCZSL), we design a progressive training paradigm to optimize the primitive classifiers conditioned on pre-trained features in an easy-to-hard manner. Extensive experiments on three widely used benchmark datasets demonstrate that our method outperforms other representative methods on both OW-CZSL and pCZSL settings by large margins.

NeurIPS Conference 2023 Conference Paper

Aligning Language Models with Human Preferences via a Bayesian Approach

  • Jiashuo WANG
  • Haozhao Wang
  • Shichao Sun
  • Wenjie Li

In the quest to advance human-centric natural language generation (NLG) systems, ensuring alignment between NLG models and human preferences is crucial. For this alignment, current popular methods leverage a reinforcement learning (RL) approach with a reward model trained on feedback from humans. However, inherent disagreements due to the subjective nature of human preferences pose a significant challenge for training the reward model, resulting in a deterioration of the NLG performance. To tackle this issue, previous approaches typically rely on majority voting or averaging to consolidate multiple inconsistent preferences into a merged one. Although straightforward to understand and execute, such methods suffer from an inability to capture the nuanced degrees of disaggregation among humans and may only represent a specialized subset of individuals, thereby lacking the ability to quantitatively disclose the universality of human preferences. To address this challenge, this paper proposes a novel approach, which employs a Bayesian framework to account for the distribution of disagreements among human preferences as training a preference model, and names it as $\textbf{d-PM}$. Besides, considering the RL strategy's inefficient and complex training process over the training efficiency, we further propose utilizing the contrastive learning strategy to train the NLG model with the preference scores derived from the d-PM model. Extensive experiments on two human-centric NLG tasks, i. e. , emotional support conversation and integrity ``Rule-of-Thumb'' generation, show that our method consistently exceeds previous SOTA models in both automatic and human evaluations.

NeurIPS Conference 2023 Conference Paper

D-Separation for Causal Self-Explanation

  • Wei Liu
  • Jun Wang
  • Haozhao Wang
  • Ruixuan Li
  • Zhiying Deng
  • YuanKai Zhang
  • Yang Qiu

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ the maximum mutual information (MMI) criterion to find the rationale that is most indicative of the target label. However, this criterion can be influenced by spurious features that correlate with the causal rationale or the target label. Instead of attempting to rectify the issues of the MMI criterion, we propose a novel criterion to uncover the causal rationale, termed the Minimum Conditional Dependence (MCD) criterion, which is grounded on our finding that the non-causal features and the target label are \emph{d-separated} by the causal rationale. By minimizing the dependence between the non-selected parts of the input and the target label conditioned on the selected rationale candidate, all the causes of the label are compelled to be selected. In this study, we employ a simple and practical measure for dependence, specifically the KL-divergence, to validate our proposed MCD criterion. Empirically, we demonstrate that MCD improves the F1 score by up to 13. 7% compared to previous state-of-the-art MMI-based methods. Our code is in an anonymous repository: https: //anonymous. 4open. science/r/MCD-CE88.

ECAI Conference 2023 Conference Paper

XFed: Improving Explainability in Federated Learning by Intersection Over Union Ratio Extended Client Selection

  • Juan Zhao 0010
  • Yuankai Zhang 0002
  • Ruixuan Li 0001
  • Yuhua Li 0003
  • Haozhao Wang
  • Xiaoquan Yi
  • Zhiying Deng

Federated Learning (FL) allows massive clients to collaboratively train a global model without revealing their private data. Because of the participants’ not independently and identically distributed (non-IID) statistical characteristics, it will cause divergence among the client’s Deep Neural Network model weights and require more communication rounds before training can be converged. Moreover, models trained from non-IID data may also extract biased features and the rationale behind the model is still not fully analyzed and exploited. In this paper, we propose eXplainable-Fed (XFed) which is a novel client selection mechanism that takes both accuracy and explainability into account. Specifically, XFed selects participants in each round based on a small test set’s accuracy via cross-entropy loss and interpretability via XAI-accuracy. XAI-accuracy is calculated by Intersection over Union Ratio between the heat map and the truth mask to evaluate the overall rationale of accuracy. The results of our experiments show that our method has comparable accuracy to state-of-the-art methods specially designed for accuracy while increasing explainability by 14%-35% in terms of rationality.

NeurIPS Conference 2022 Conference Paper

FR: Folded Rationalization with a Unified Encoder

  • Wei Liu
  • Haozhao Wang
  • Jun Wang
  • Ruixuan Li
  • Chao Yue
  • YuanKai Zhang

Rationalization aims to strengthen the interpretability of NLP models by extracting a subset of human-intelligible pieces of their inputting texts. Conventional works generally employ a two-phase model in which a generator selects the most important pieces, followed by a predictor that makes predictions based on the selected pieces. However, such a two-phase model may incur the degeneration problem where the predictor overfits to the noise generated by a not yet well-trained generator and in turn, leads the generator to converge to a suboptimal model that tends to select senseless pieces. To tackle this challenge, we propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction. The key idea of FR is to employ a unified encoder between the generator and predictor, based on which FR can facilitate a better predictor by access to valuable information blocked by the generator in the traditional two-phase model and thus bring a better generator. Empirically, we show that FR improves the F1 score by up to 10. 3% as compared to state-of-the-art methods.

NeurIPS Conference 2021 Conference Paper

Parameterized Knowledge Transfer for Personalized Federated Learning

  • Jie Zhang
  • Song Guo
  • XIAOSONG MA
  • Haozhao Wang
  • Wenchao Xu
  • Feijie Wu

In recent years, personalized federated learning (pFL) has attracted increasing attention for its potential in dealing with statistical heterogeneity among clients. However, the state-of-the-art pFL methods rely on model parameters aggregation at the server side, which require all models to have the same structure and size, and thus limits the application for more heterogeneous scenarios. To deal with such model constraints, we exploit the potentials of heterogeneous model settings and propose a novel training framework to employ personalized models for different clients. Specifically, we formulate the aggregation procedure in original pFL into a personalized group knowledge transfer training algorithm, namely, KT-pFL, which enables each client to maintain a personalized soft prediction at the server side to guide the others' local training. KT-pFL updates the personalized soft prediction of each client by a linear combination of all local soft predictions using a knowledge coefficient matrix, which can adaptively reinforce the collaboration among clients who own similar data distribution. Furthermore, to quantify the contributions of each client to others' personalized training, the knowledge coefficient matrix is parameterized so that it can be trained simultaneously with the models. The knowledge coefficient matrix and the model parameters are alternatively updated in each round following the gradient descent way. Extensive experiments on various datasets (EMNIST, Fashion_MNIST, CIFAR-10) are conducted under different settings (heterogeneous models and data distributions). It is demonstrated that the proposed framework is the first federated learning paradigm that realizes personalized model training via parameterized group knowledge transfer while achieving significant performance gain comparing with state-of-the-art algorithms.