Arrow Research search

Author name cluster

Xiaoxiao Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers
2 author rows

Possible papers

41

NeurIPS Conference 2025 Conference Paper

A Reinforcement Learning-based Bidding Strategy for Data Consumers in Auction-based Federated Learning

  • Xiaoli Tang
  • Han Yu
  • Xiaoxiao Li

Auction-based Federated Learning (AFL) fosters collaboration among self-interested data consumers (DCs) and data owners (DOs). A major challenge in AFL pertains to how DCs select and bid for DOs. Existing methods are generally static, making them ill-suited for dynamic AFL markets. To address this issue, we propose the R}einforcement Learning-based Bidding Strategy for DCs in Auction-based Federated Learning (RLB-AFL). We incorporate historical states into a Deep Q-Network to capture sequential information critical for bidding decisions. To mitigate state space sparsity, where specific states rarely reoccur for each DC during auctions, we incorporate the Gaussian Mixture Model into RLB-AFL. This facilitates soft clustering on sequential states, reducing the state space dimensionality and easing exploration and action-value function approximation. In addition, we enhance the $\epsilon$-greedy policy to help the RLB-AFL agent balance exploitation and exploration, enabling it to be more adaptable in the AFL decision-making process. Extensive experiments under 6 widely used benchmark datasets demonstrate that RLB-AFL achieves superior performance compared to 8 state-of-the-art approaches. It outperforms the best baseline by 10. 56% and 3. 15% in terms of average total utility

ICLR Conference 2025 Conference Paper

Can Textual Gradient Work in Federated Learning?

  • Minghui Chen
  • Ruinan Jin
  • Wenlong Deng
  • Yuanyuan Chen 0012
  • Zhi Huang
  • Han Yu 0001
  • Xiaoxiao Li

Recent studies highlight the promise of LLM-based prompt optimization, especially with TextGrad, which automates ``differentiation'' via texts and backpropagates textual feedback provided by LLMs. This approach facilitates training in various real-world applications that do not support numerical gradient propagation or loss calculation. It opens new avenues for optimization in decentralized, resource-constrained environments, suggesting that users of black-box LLMs (e.g., ChatGPT) could enhance components of LLM agentic systems (such as prompt optimization) through collaborative paradigms like federated learning (FL). In this paper, we systematically explore the potential and challenges of incorporating textual gradient into FL. Our contributions are fourfold. **Firstly**, we introduce a novel FL paradigm, Federated Textual Gradient (FedTextGrad), that allows FL clients to upload their locally optimized prompts derived from textual gradients, while the FL server aggregates the received prompts through text summarization. Unlike traditional FL frameworks, which are designed for numerical aggregation, FedTextGrad is specifically tailored for handling textual data, expanding the applicability of FL to a broader range of problems that lack well-defined numerical loss functions. **Secondly**, building on this design, we conduct extensive experiments to explore the feasibility of federated textual gradients. Our findings highlight the importance of properly tuning key factors (e.g., local steps) in FL training to effectively integrate textual gradients. **Thirdly**, we highlight a major challenge in federated textual gradient aggregation: retaining essential information from distributed prompt updates. Concatenation often produces prompts that exceed the LLM API’s context window, while summarization can degrade performance by generating overly condensed or complex text that lacks key context. **Last but not least**, in response to this issue, we improve the vanilla variant of FedTextGrad by providing actionable guidance to the LLM when summarizing client prompts by leveraging the Uniform Information Density principle. Such a design reduces the complexity of the aggregated global prompt, thereby better incentivizing the LLM's reasoning ability. Through this principled study, we enable the adoption of textual gradients in FL for optimizing LLMs, identify important issues, and pinpoint future directions, thereby opening up a new research area that warrants further investigation.

NeurIPS Conference 2025 Conference Paper

Class-wise Balancing Data Replay for Federated Class-Incremental Learning

  • Zhuang Qi
  • Ying-Peng Tang
  • Lei Meng
  • Han Yu
  • Xiaoxiao Li
  • Xiangxu Meng

Federated Class Incremental Learning (FCIL) aims to collaboratively process continuously increasing incoming tasks across multiple clients. Among various approaches, data replay has become a promising solution, which can alleviate forgetting by reintroducing representative samples from previous tasks. However, their performance is typically limited by class imbalance, both within the replay buffer due to limited global awareness and between replayed and newly arrived classes. To address this issue, we propose a class-wise balancing data replay method for FCIL (FedCBDR), which employs a global coordination mechanism for class-level memory construction and reweights the learning objective to alleviate the aforementioned imbalances. Specifically, FedCBDR has two key components: 1) the global-perspective data replay module reconstructs global representations of prior task knowledge in a privacy-preserving manner, which then guides a class-aware and importance-sensitive sampling strategy to achieve balanced replay; 2) Subsequently, to handle class imbalance across tasks, the task-aware temperature scaling module adaptively adjusts the temperature of logits at both class and instance levels based on task dynamics, which reduces the model’s overconfidence in majority classes while enhancing its sensitivity to minority classes. Experimental results verified that FedCBDR achieves balanced class-wise sampling under heterogeneous data distributions and improves generalization under task imbalance between earlier and recent tasks, yielding a 2%-15% Top-1 accuracy improvement over six state-of-the-art methods.

ICML Conference 2025 Conference Paper

Commute Graph Neural Networks

  • Wei Zhuo 0006
  • Han Yu 0001
  • Guang Tan
  • Xiaoxiao Li

Graph Neural Networks (GNNs) have shown remarkable success in learning from graph-structured data. However, their application to directed graphs (digraphs) presents unique challenges, primarily due to the inherent asymmetry in node relationships. Traditional GNNs are adept at capturing unidirectional relations but fall short in encoding the mutual path dependencies between nodes, such as asymmetrical shortest paths typically found in digraphs. Recognizing this gap, we introduce Commute Graph Neural Networks (CGNN), an approach that seamlessly integrates node-wise commute time into the message passing scheme. The cornerstone of CGNN is an efficient method for computing commute time using a newly formulated digraph Laplacian. Commute time is then integrated into the neighborhood aggregation process, with neighbor contributions weighted according to their respective commute time to the central node in each layer. It enables CGNN to directly capture the mutual, asymmetric relationships in digraphs. Extensive experiments on 8 benchmarking datasets confirm the superiority of CGNN against 13 state-of-the-art methods.

ICLR Conference 2025 Conference Paper

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models

  • Wenlong Deng
  • Yize Zhao
  • Vala Vakilian
  • Minghui Chen
  • Xiaoxiao Li
  • Christos Thrampoulidis

Storing open-source fine-tuned models separately introduces redundancy and increases response times in applications utilizing multiple models. Delta-parameter pruning (DPP), particularly the random drop and rescale (DARE) method proposed by Yu et al., addresses this by pruning the majority of delta parameters—the differences between fine-tuned and pre-trained model weights—while typically maintaining minimal performance loss. However, DARE fails when either the pruning rate or the magnitude of the delta parameters is large. We highlight two key reasons for this failure: (1) an excessively large rescaling factor as pruning rates increase, and (2) high mean and variance in the delta parameters. To push DARE’s limits, we introduce DAREx (DARE the eXtreme), which features two algorithmic improvements: (1) DAREx-q, a rescaling factor modification that significantly boosts performance at high pruning rates (e.g., > 30% on COLA and SST2 for encoder models, with even greater gains in decoder models), and (2) DAREx-L2, which combines DARE with AdamR, an in-training method that applies appropriate delta regularization before DPP. We also demonstrate that DAREx-q can be seamlessly combined with vanilla parameter-efficient fine-tuning techniques like LoRA and can facilitate structural DPP. Additionally, we revisit the application of importance-based pruning techniques within DPP, demonstrating that they outperform random-based methods when delta parameters are large. Through this comprehensive study, we develop a pipeline for selecting the most appropriate DPP method under various practical scenarios.

JBHI Journal 2025 Journal Article

EMGANet: Edge-Aware Multi-Scale Group-Mix Attention Network for Breast Cancer Ultrasound Image Segmentation

  • Jin Huang
  • Yazhao Mao
  • Jingwen Deng
  • Zhaoyi Ye
  • Yimin Zhang
  • Jingwen Zhang
  • Lan Dong
  • Hui Shen

Breast cancer is one of the most prevalent diseases for women worldwide. Early and accurate ultrasound image segmentation plays a crucial role in reducing mortality. Although deep learning methods have demonstrated remarkable segmentation potential, they still struggle with challenges in ultrasound images, including blurred boundaries and speckle noise. To generate accurate ultrasound image segmentation, this paper proposes the Edge-Aware Multi-Scale Group-Mix Attention Network (EMGANet), which generates accurate segmentation by integrating deep and edge features. The Multi-Scale Group Mix Attention block effectively aggregates both sparse global and local features, ensuring the extraction of valuable information. The subsequent Edge Feature Enhancement block then focuses on cancer boundaries, enhancing the segmentation accuracy. Therefore, EMGANet effectively tackles unclear boundaries and noise in ultrasound images. We conduct experiments on two public datasets (Dataset-B, BUSI) and one private dataset which contains 927 samples from Renmin Hospital of Wuhan University (BUSI-WHU). EMGANet demonstrates superior segmentation performance, achieving an overall accuracy (OA) of 98. 56%, a mean IoU (mIoU) of 90. 32%, and an ASSD of 6. 1 pixels on the BUSI-WHU dataset. Additionally, EMGANet performs well on two public datasets, with a mIoU of 88. 2% and an ASSD of 9. 2 pixels on Dataset-B, and a mIoU of 81. 37% and an ASSD of 18. 27 pixels on the BUSI dataset. EMGANet achieves a state-of-the-art segmentation performance of about 2% in mIoU across three datasets. In summary, the proposed EMGANet significantly improves breast cancer segmentation through Edge-Aware and Group-Mix Attention mechanisms, showing great potential for clinical applications.

AAAI Conference 2025 Conference Paper

Federated Causally Invariant Feature Learning

  • Xianjie Guo
  • Kui Yu
  • Lizhen Cui
  • Han Yu
  • Xiaoxiao Li

Federated feature selection (FFS) is a promising field for selecting informative features while preserving data privacy in federated learning (FL) settings. Existing FFS methods focus on capturing the correlations between features and labels. They struggle to achieve satisfactory performance in the face of data distribution heterogeneity among FL clients, and cannot address the out-of-distribution (OOD) problem that arises when a significant portion of clients do not actively participate in FL training. To address these limitations, we propose Federated Causally Invariant Feature Learning (FedCIFL), a novel approach for learning causally invariant features in a privacy-preserving manner. We design a sample reweighting strategy to eliminate spurious correlations introduced by selection bias and iteratively estimate the federated causal effect between each feature and the labels (with the remaining features initially treated as confounders). By iteratively refining the confounding feature set to identify the true confounders, FedCIFL mitigates the impact of limited local data on the accuracy of federated causal effect estimation. Theoretical analysis proves the correctness of FedCIFL under reasonable assumptions. Extensive experiments on synthetic and real-world datasets demonstrate the superiority of FedCIFL against eight state-of-the-art baselines, beating the best-performing approach by 3.19%, 9.07% and 2.65% in terms of average test Accuracy, RMSE and F1 score, respectively. It is a first-of-its-kind FFS approach capable of handling Non-IID and OOD data simultaneously. The source code is available at https://github.com/Xianjie-Guo/FedCIFL.

TMLR Journal 2025 Journal Article

Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation

  • Chun-Yin Huang
  • Ruinan Jin
  • Can Zhao
  • Daguang Xu
  • Xiaoxiao Li

While Federated Learning (FL) is gaining popularity for training machine learning models in a decentralized fashion, numerous challenges persist, such as asynchronization, computational expenses, data heterogeneity, and gradient and membership privacy attacks. Lately, dataset distillation has emerged as a promising solution for addressing the aforementioned challenges by generating a compact synthetic dataset that preserves a model's training efficacy. However, we discover that using distilled local datasets can amplify the heterogeneity issue in FL. To address this, we propose Federated Learning on Virtual Heterogeneous Data with Local-Global Dataset Distillation (FedLGD), where we seamlessly integrate dataset distillation algorithms into FL pipeline and train FL using a smaller synthetic dataset (referred as virtual data). Specifically, to harmonize the domain shifts, we propose iterative distribution matching to inpaint global information to *local virtual data* and use federated gradient matching to distill *global virtual data* that serve as anchor points to rectify heterogeneous local training, without compromising data privacy. We experiment on both benchmark and real-world datasets that contain heterogeneous data from different sources, and further scale up to an FL scenario that contains a large number of clients with heterogeneous and class-imbalanced data. Our method outperforms *state-of-the-art* heterogeneous FL algorithms under various settings.

NeurIPS Conference 2025 Conference Paper

Global Prompt Refinement with Non-Interfering Attention Masking for One-Shot Federated Learning

  • Zhuang Qi
  • Yu Pan
  • Lei Meng
  • Sijin Zhou
  • Han Yu
  • Xiaoxiao Li
  • Xiangxu Meng

Federated Prompt Learning (FPL) enables communication-efficient adaptation by tuning lightweight prompts on top of frozen pre-trained models. Existing FPL methods typically rely on global information, which is only available after the second training round, to facilitate collaboration among client models. Therefore, they are inherently dependent on multi-round communication to fully exhibit their strengths. Moreover, existing one-shot federated learning methods typically focus on fitting seen tasks, but lack cross-task generalization. To bridge this gap, we propose the global prompt refinement with non-interfering attention masking (GPR-NIAM) method for one-shot FPL. The core idea is to design a masking mechanism that restricts excessive interaction between the original text embeddings and the learnable prompt embeddings. GPR-NIAM achieves this through the collaboration of two key modules. Firstly, the attention isolation module suppresses attention from the learnable prompt tokens to the original text tokens, and reweights the reverse attention which preserves generalization across tasks. Secondly, the cross-silo collaborative refinement module integrates decentralized visual knowledge into a unified base and calibrates the global prompt through multi-source cross-modal knowledge alignment, further mitigating the inconsistency caused by data heterogeneity. Extensive experiments conducted on ten benchmark datasets under two tasks show that GPR-NIAM outperforms eight state-of-the-art methods in both class-level and domain-level generalization.

ICLR Conference 2025 Conference Paper

GMValuator: Similarity-based Data Valuation for Generative Models

  • Jiaxi Yang 0003
  • Wenlong Deng
  • Benlin Liu
  • Yangsibo Huang
  • James Zou
  • Xiaoxiao Li

Data valuation plays a crucial role in machine learning. Existing data valuation methods, mainly focused on discriminative models, overlook generative models that have gained attention recently. In generative models, data valuation measures the impact of training data on generated datasets. Very few existing attempts at data valuation methods designed for deep generative models either concentrate on specific models or lack robustness in their outcomes. Moreover, efficiency still reveals vulnerable shortcomings. We formulate the data valuation problem in generative models from a similarity matching perspective to bridge the gaps. Specifically, we introduce Generative Model Valuator (GMValuator), the first training-free and model-agnostic approach to providing data valuation for image generation tasks. It empowers efficient data valuation through our innovative similarity matching module, calibrates biased contributions by incorporating image quality assessment, and attributes credits to all training samples based on their contributions to the generated samples. Additionally, we introduce four evaluation criteria for assessing data valuation methods in generative models. GMValuator is extensively evaluated on benchmark and high-resolution datasets and various mainstream generative architectures to demonstrate its effectiveness. Our code is available at: https://github.com/ubc-tea/GMValuator.

JBHI Journal 2025 Journal Article

MRRM: Advanced Biomarker Alignment in Multi-Staining Pathology Images via Multi-Scale Ring Rotation-Invariant Matching

  • Xiaoxiao Li
  • Taobo Hu
  • Zhengxiong Li
  • Mengping Long
  • Zhaoyi Ye
  • Jin Huang
  • Yaxiaer Yalikun
  • Sheng Liu

Pathology image matching is crucial for assisting pathologists in the comprehensive diagnosis of cancerous areas. However, variations in image rotation and staining caused by inherent slide imaging techniques increase the burden on pathologists, complicating the examination of cancer across different pathology slides. To address this challenge, we introduce multi-scale ring rotation-invariant matching (MRRM), which improves image matching efficiency using ring topology, assisting pathologists in robustly aligning biomarker information across various pathology images. Specifically, by employing multi-scale rings as convolution kernels, we accurately locate keypoints from the differencing of the ring pyramid, which not only enhances the likelihood of successful pathology image matching but also supports our feature descriptor in achieving advantageous performance in rotation-invariance. Experiments show that with manually annotated golden landmarks as the standard in 81 cases, exhibiting significantly superior matching accuracy (130. 93 $\, \mu \mathrm{m}$ ) and a success rate of 93. 83% compared to other methods, particularly in cases with rotated pathology images. This meets the routine diagnostic requirements of pathologists for cancer diagnosis.

ICML Conference 2025 Conference Paper

Multi-Session Budget Optimization for Forward Auction-based Federated Learning

  • Xiaoli Tang 0001
  • Han Yu 0001
  • Zengxiang Li
  • Xiaoxiao Li

Auction-based Federated Learning (AFL) has emerged as an important research field in recent years. The prevailing strategies for FL data consumers (DCs) assume that the entire team of the required data owners (DOs) for an FL task must be assembled before training can commence. In practice, a DC can trigger the FL training process multiple times. DOs can thus be gradually recruited over multiple FL model training sessions. Existing bidding strategies for AFL DCs are not designed to handle such scenarios. Therefore, the problem of multi-session AFL remains open. To address this problem, we propose the Multi-session Budget Optimization Strategy for forward Auction-based Federated Learning (MBOS-AFL). Based on hierarchical reinforcement learning, MBOS-AFL jointly optimizes intersession budget pacing and intra-session bidding for AFL DCs, with the objective of maximizing the total utility. Extensive experiments on six benchmark datasets show that it significantly outperforms seven state-of-the-art approaches. On average, MBOS-AFL achieves 12. 28% higher utility, 14. 52% more data acquired through auctions for a given budget, and 1. 23% higher test accuracy achieved by the resulting FL model compared to the best baseline. To the best of our knowledge, it is the first budget optimization decision support method with budget pacing capability designed for DCs in multi-session forward AFL.

NeurIPS Conference 2025 Conference Paper

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization

  • wenlong deng
  • Yi Ren
  • Muchen Li
  • Danica J. Sutherland
  • Xiaoxiao Li
  • Christos Thrampoulidis

Reinforcement learning (RL) has become popular in enhancing the reasoning capabilities of large language models (LLMs), with Group Relative Policy Optimization (GRPO) emerging as a widely used algorithm in recent systems. Despite GRPO's widespread adoption, we identify a previously unrecognized phenomenon we term Lazy Likelihood Displacement (LLD), wherein the likelihood of correct responses marginally increases or even decreases during training. This behavior mirrors a recently discovered misalignment issue in Direct Preference Optimization (DPO), attributed to the influence of negative gradients. We provide a theoretical analysis of GRPO’s learning dynamic, identifying the source of LLD as the naive penalization of all tokens in incorrect responses with the same strength. To address this, we develop a method called NTHR, which downweights penalties on tokens contributing to the LLD. Unlike prior DPO-based approaches, NTHR takes advantage of GRPO’s group-based structure, using correct responses as anchors to identify influential tokens. Experiments on math reasoning benchmarks demonstrate that NTHR effectively mitigates LLD, yielding consistent performance gains across models ranging from 0. 5B to 3B parameters.

AAAI Conference 2025 Conference Paper

pFedES: Generalized Proxy Feature Extractor Sharing for Model Heterogeneous Personalized Federated Learning

  • Liping Yi
  • Han Yu
  • Chao Ren
  • Gang Wang
  • Xiaoguang Liu
  • Xiaoxiao Li

Federated learning (FL), as a privacy-preserving collaborative machine learning paradigm, has attracted significant interest from industry and academia. To allow each data owner (FL client) to train a heterogeneous and personalized local model based on its local data distribution, system resources and requirements on model structure, the field of model-heterogeneous personalized federated learning (MHPFL) has emerged. Existing MHPFL approaches either rely on the availability of a public dataset with special characteristics to facilitate knowledge transfer, incur high computational and communication costs, or face potential model leakage risks. To address these limitations, we propose a model-heterogeneous personalized Federated learning approach based on generalized proxy feature Extractor Sharing (pFedES) for supervised image classification tasks. (1) We devise a shared small proxy homogeneous feature extractor before each client's heterogeneous local model. (2) Clients train them via the proposed iterative learning to enable the exchange of global generalized knowledge and local personalized knowledge. (3) The small proxy local homogeneous extractors produced after local training are uploaded to the server for aggregation to facilitate knowledge fusion across clients. We theoretically prove pFedES converges with a non-convex convergence rate O(1/T). Experiments on 3 benchmark datasets against 9 baselines demonstrate that pFedES performs state-of-the-art model accuracy while maintaining efficient communication and computation.

ICLR Conference 2025 Conference Paper

S4M: S4 for multivariate time series forecasting with Missing values

  • Peng Jing
  • Meiqi Yang
  • Qiong Zhang
  • Xiaoxiao Li

Multivariate time series data play a pivotal role in a wide range of real-world applications, such as finance, healthcare, and meteorology, where accurate forecasting is critical for informed decision-making and proactive interventions. However, the presence of block missing data introduces significant challenges, often compromising the performance of predictive models. Traditional two-step approaches, which first impute missing values and then perform forecasting, are prone to error accumulation, particularly in complex multivariate settings characterized by high missing ratios and intricate dependency structures. In this work, we introduce S4M, an end-to-end time series forecasting framework that seamlessly integrates missing data handling into the Structured State Space Sequence (S4) model architecture. Unlike conventional methods that treat imputation as a separate preprocessing step, S4M leverages the latent space of S4 models to directly recognize and represent missing data patterns, thereby more effectively capturing the underlying temporal and multivariate dependencies. Our framework comprises two key components: the Adaptive Temporal Prototype Mapper (ATPM) and the Missing-Aware Dual Stream S4 (MDS-S4). The ATPM employs a prototype bank to derive robust and informative representations from historical data patterns, while the MDS-S4 processes these representations alongside missingness masks as dual input streams to enable accurate forecasting. Through extensive empirical evaluations on diverse real-world datasets, we demonstrate that S4M consistently achieves state-of-the-art performance. These results underscore the efficacy of our integrated approach in handling missing data, showcasing its robustness and superiority over traditional imputation-based methods. Our findings highlight the potential of S4M to advance reliable time series forecasting in practical applications, offering a promising direction for future research and deployment. Code is available at https://github.com/WINTERWEEL/S4M.git.

IJCAI Conference 2024 Conference Paper

A Bias-Free Revenue-Maximizing Bidding Strategy for Data Consumers in Auction-based Federated Learning

  • Xiaoli Tang
  • Han Yu
  • Zengxiang Li
  • Xiaoxiao Li

Auction-based Federated Learning (AFL) is a burgeoning research area. However, existing bidding strategies for AFL data consumers (DCs) primarily focus on maximizing expected accumulated utility, disregarding the more complex goal of revenue maximization. They also only consider winning bids, leading to biased estimates by overlooking information from losing bids. To address these issues, we propose a Bias-free Revenue-maximizing Federated bidding strategy for DCs in AFL (BR-FEDBIDDER). Our theoretical exploration of the relationships between Return on Investment (ROI), bid costs, and utility, and their impact on overall revenue underscores the complexity of maximizing revenue solely by prioritizing ROI enhancement. Leveraging these insights, BR-FEDBIDDER optimizes bid costs with any given ROI constraint. In addition, we incorporate an auxiliary task of winning probability estimation into the framework to achieve bias-free learning by leveraging bid records from historical bid requests, including both winning and losing ones. Extensive experiments on six widely used benchmark datasets show that BR-FEDBIDDER outperforms eight state-of-the-art methods, surpassing the best-performing baseline by 5. 66%, 6. 08% and 2. 44% in terms of the total revenue, ROI, and test accuracy of the resulting FL models, respectively.

JAIR Journal 2024 Journal Article

Differentially Private Neural Tangent Kernels (DP-NTK) for Privacy-Preserving Data Generation

  • Yilin Yang
  • Kamil Adamczewski
  • Xiaoxiao Li
  • Danica J. Sutherland
  • Mijung Park

Maximum mean discrepancy (MMD) is a particularly useful distance metric for differentially private data generation: when used with finite-dimensional features, it allows us to summarize and privatize the data distribution once, which we can repeatedly use during generator training without further privacy loss. An important question in this framework is, then, what features are useful to distinguish between real and synthetic data distributions, and whether those enable us to generate quality synthetic data. This work considers using the features of neural tangent kernels (NTKs), more precisely empirical NTKs (e-NTKs). We find that, perhaps surprisingly, the expressiveness of the untrained e-NTK features is comparable to that of the features taken from pre-trained perceptual features using public data. As a result, our method improves the privacy-accuracy trade-off compared to other state-of-the-art methods, without relying on any public data, as demonstrated on several tabular and image benchmark datasets.

IJCAI Conference 2024 Conference Paper

Dual Calibration-based Personalised Federated Learning

  • Xiaoli Tang
  • Han Yu
  • Run Tang
  • Chao Ren
  • Anran Li
  • Xiaoxiao Li

Personalized federated learning (PFL) is designed for scenarios with non-independent and identically distributed (non-IID) client data. Existing model mixup-based methods, one of the main approaches of PFL, can only extract either global or personalized features during training, thereby limiting effective knowledge sharing among clients. To address this limitation, we propose the Dual Calibration-based PFL (DC-PFL). It divides local models into a heterogeneous feature extractor and a homogeneous classifier. The FL server utilizes mean and covariance representations from clients' feature extractors to train a global generalized classifier, facilitating information exchange while preserving privacy. To enhance personalization and convergence, we design a feature extractor-level calibration method with an auxiliary loss for local models to refine feature extractors using global knowledge. Furthermore, DC-PFL refines the global classifier through the global classifier-level calibration, utilizing sample representations derived from an approximate Gaussian distribution model specific to each class. This method precludes the need to transmit original data representations, further enhancing privacy preservation. Extensive experiments on widely used benchmark datasets demonstrate that DC-PFL outperforms eight state-of-the-art methods, surpassing the best-performing baseline by 1. 22% and 9. 22% in terms of accuracy on datasets CIFAR-10 and CIFAR-100, respectively.

NeurIPS Conference 2024 Conference Paper

FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models

  • Ruinan Jin
  • Zikang Xu
  • Yuan Zhong
  • Qingsong Yao
  • Qi Dou
  • S. K. Zhou
  • Xiaoxiao Li

The advent of foundation models (FMs) in healthcare offers unprecedented opportunities to enhance medical diagnostics through automated classification and segmentation tasks. However, these models also raise significant concerns about their fairness, especially when applied to diverse and underrepresented populations in healthcare applications. Currently, there is a lack of comprehensive benchmarks, standardized pipelines, and easily adaptable libraries to evaluate and understand the fairness performance of FMs in medical imaging, leading to considerable challenges in formulating and implementing solutions that ensure equitable outcomes across diverse patient populations. To fill this gap, we introduce FairMedFM, a fairness benchmark for FM research in medical imaging. FairMedFM integrates with 17 popular medical imaging datasets, encompassing different modalities, dimensionalities, and sensitive attributes. It explores 20 widely used FMs, with various usages such as zero-shot learning, linear probing, parameter-efficient fine-tuning, and prompting in various downstream tasks -- classification and segmentation. Our exhaustive analysis evaluates the fairness performance over different evaluation metrics from multiple perspectives, revealing the existence of bias, varied utility-fairness trade-offs on different FMs, consistent disparities on the same datasets regardless FMs, and limited effectiveness of existing unfairness mitigation methods. Furthermore, FairMedFM provides an open-sourced codebase at https: //github. com/FairMedFM/FairMedFM, supporting extendible functionalities and applications and inclusive for studies on FMs in medical imaging over the long term.

ICML Conference 2024 Conference Paper

FedCal: Achieving Local and Global Calibration in Federated Learning via Aggregated Parameterized Scaler

  • Hongyi Peng
  • Han Yu 0001
  • Xiaoli Tang 0001
  • Xiaoxiao Li

Federated learning (FL) enables collaborative machine learning across distributed data owners, but data heterogeneity poses a challenge for model calibration. While prior work focused on improving accuracy for non-iid data, calibration remains under-explored. This study reveals existing FL aggregation approaches lead to sub-optimal calibration, and theoretical analysis shows despite constraining variance in clients’ label distributions, global calibration error is still asymptotically lower bounded. To address this, we propose a novel Federated Calibration (FedCal) approach, emphasizing both local and global calibration. It leverages client-specific scalers for local calibration to effectively correct output misalignment without sacrificing prediction accuracy. These scalers are then aggregated via weight averaging to generate a global scaler, minimizing the global calibration error. Extensive experiments demonstrate that FedCal significantly outperforms the best-performing baseline, reducing global calibration error by 47. 66% on average.

NeurIPS Conference 2024 Conference Paper

Federated Model Heterogeneous Matryoshka Representation Learning

  • Liping Yi
  • Han Yu
  • Chao Ren
  • Gang Wang
  • Xiaoguang Liu
  • Xiaoxiao Li

Model heterogeneous federated learning (MHeteroFL) enables FL clients to collaboratively train models with heterogeneous structures in a distributed fashion. However, existing MHeteroFL methods rely on training loss to transfer knowledge between the client model and the server model, resulting in limited knowledge exchange. To address this limitation, we propose the **Fed**erated model heterogeneous **M**atryoshka **R**epresentation **L**earning (**FedMRL**) approach for supervised learning tasks. It adds an auxiliary small homogeneous model shared by clients with heterogeneous local models. (1) The generalized and personalized representations extracted by the two models' feature extractors are fused by a personalized lightweight representation projector. This step enables representation fusion to adapt to local data distribution. (2) The fused representation is then used to construct Matryoshka representations with multi-dimensional and multi-granular embedded representations learned by the global homogeneous model header and the local heterogeneous model header. This step facilitates multi-perspective representation learning and improves model learning capability. Theoretical analysis shows that FedMRL achieves a $O(1/T)$ non-convex convergence rate. Extensive experiments on benchmark datasets demonstrate its superior model accuracy with low communication and computational costs compared to seven state-of-the-art baselines. It achieves up to 8. 48% and 24. 94% accuracy improvement compared with the state-of-the-art and the best same-category baseline, respectively.

IJCAI Conference 2024 Conference Paper

FedSSA: Semantic Similarity-based Aggregation for Efficient Model-Heterogeneous Personalized Federated Learning

  • Liping Yi
  • Han Yu
  • Zhuan Shi
  • Gang Wang
  • Xiaoguang Liu
  • Lizhen Cui
  • Xiaoxiao Li

Federated learning (FL) is a privacy-preserving collaboratively machine learning paradigm. Traditional FL requires all data owners (a. k. a. FL clients) to train the same local model. This design is not well-suited for scenarios involving data and/or system heterogeneity. Model-Heterogeneous Personalized FL (MHPFL) has emerged to address this challenge. Existing MHPFL approaches often rely on a public dataset with the same nature as the learning task, or incur high computation and communication costs. To address these limitations, we propose the Federated Semantic Similarity Aggregation (FedSSA) approach for supervised classification tasks, which splits each client's model into a heterogeneous (structure-different) feature extractor and a homogeneous (structure-same) classification header. It performs local-to-global knowledge transfer via semantic similarity-based header parameter aggregation. In addition, global-to-local knowledge transfer is achieved via an adaptive parameter stabilization strategy which fuses the seen-class parameters of historical local headers with that of the latest global header for each client. FedSSA does not rely on public datasets, while only requiring partial header parameter transmission to save costs. Theoretical analysis proves the convergence of FedSSA. Extensive experiments present that FedSSA achieves up to 3. 62% higher accuracy, 15. 54 times higher communication efficiency, and 15. 52 times higher computational efficiency compared to 7 state-of-the-art MHPFL baselines.

ICLR Conference 2024 Conference Paper

Heterogeneous Personalized Federated Learning by Local-Global Updates Mixing via Convergence Rate

  • Meirui Jiang
  • Anjie Le
  • Xiaoxiao Li
  • Qi Dou 0001

Personalized federated learning (PFL) has emerged as a promising technique for addressing the challenge of data heterogeneity. While recent studies have made notable progress in mitigating heterogeneity associated with label distributions, the issue of effectively handling feature heterogeneity remains an open question. In this paper, we propose a personalization approach by Local-global updates Mixing (LG-Mix) via Neural Tangent Kernel (NTK)-based convergence. The core idea is to leverage the convergence rate induced by NTK to quantify the importance of local and global updates, and subsequently mix these updates based on their importance. Specifically, we find the trace of the NTK matrix can manifest the convergence rate, and propose an efficient and effective approximation to calculate the trace of a feature matrix instead of the NTK matrix. Such approximation significantly reduces the cost of computing NTK, and the feature matrix explicitly considers the heterogeneous features among samples. We have theoretically analyzed the convergence of our method in the over-parameterize regime, and experimentally evaluated our method on five datasets. These datasets present heterogeneous data features in natural and medical images. With comprehensive comparison to existing state-of-the-art approaches, our LG-Mix has consistently outperformed them across all datasets (largest accuracy improvement of 5.01\%), demonstrating the outstanding efficacy of our method for model personalization. Code is available at \url{https://github.com/med-air/HeteroPFL}.

IJCAI Conference 2024 Conference Paper

Intelligent Agents for Auction-based Federated Learning: A Survey

  • Xiaoli Tang
  • Han Yu
  • Xiaoxiao Li
  • Sarit Kraus

Auction-based federated learning (AFL) is an important emerging category of FL incentive mechanism design, due to its ability to fairly and efficiently motivate high-quality data owners to join data consumers' (i. e. , servers') FL training tasks. To enhance the efficiency in AFL decision support for stakeholders (i. e. , data consumers, data owners, and the auctioneer), intelligent agent-based techniques have emerged. However, due to the highly interdisciplinary nature of this field and the lack of a comprehensive survey providing an accessible perspective, it is a challenge for researchers to enter and contribute to this field. This paper bridges this important gap by providing a first-of-its-kind survey on the Intelligent Agents for AFL (IA-AFL) literature. We propose a unique multi-tiered taxonomy that organises existing IA-AFL works according to 1) the stakeholders served, 2) the auction mechanism adopted, and 3) the goals of the agents, to provide readers with a multi-perspective view into this field. In addition, we analyse the limitations of existing approaches, summarise the commonly adopted performance evaluation metrics, and discuss promising future directions leading towards effective and efficient stakeholder-oriented decision support in IA-AFL ecosystems.

ICML Conference 2024 Conference Paper

Learning High-Order Relationships of Brain Regions

  • Weikang Qiu
  • Huangrui Chu
  • Selena Wang
  • Haolan Zuo
  • Xiaoxiao Li
  • Yize Zhao
  • Rex Ying

Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions in neuroscience. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationships should be maximally informative and minimally redundant (MIMR). However, identifying such high-order relationships is challenging and under-explored due to the exponential search space and the absence of a tractable objective. In response to this gap, we propose a novel method named HyBRiD, which aims to extract MIMR high-order relationships from fMRI data. HyBRiD employs a Constructor to identify hyperedge structures, and a Weighter to compute a weight for each hyperedge, which avoids searching in exponential space. HyBRiD achieves the MIMR objective through an innovative information bottleneck framework named multi-head drop-bottleneck with theoretical guarantees. Our comprehensive experiments demonstrate the effectiveness of our model. Our model outperforms the state-of-the-art predictive model by an average of 11. 2%, regarding the quality of hyperedges measured by CPM, a standard protocol for studying brain connections.

NeurIPS Conference 2024 Conference Paper

Local Superior Soups: A Catalyst for Model Merging in Cross-Silo Federated Learning

  • Minghui Chen
  • Meirui Jiang
  • Xin Zhang
  • Qi Dou
  • Zehua Wang
  • Xiaoxiao Li

Federated learning (FL) is a learning paradigm that enables collaborative training of models using decentralized data. Recently, the utilization of pre-trained weight initialization in FL has been demonstrated to effectively improve model performance. However, the evolving complexity of current pre-trained models, characterized by a substantial increase in parameters, markedly intensifies the challenges associated with communication rounds required for their adaptation to FL. To address these communication cost issues and increase the performance of pre-trained model adaptation in FL, we propose an innovative model interpolation-based local training technique called ``Local Superior Soups. ''Our method enhances local training across different clients, encouraging the exploration of a connected low-loss basin within a few communication rounds through regularized model interpolation. This approach acts as a catalyst for the seamless adaptation of pre-trained models in in FL. We demonstrated its effectiveness and efficiency across diverse widely-used FL datasets.

JBHI Journal 2024 Journal Article

MSGM: An Advanced Deep Multi-Size Guiding Matching Network for Whole Slide Histopathology Images Addressing Staining Variation and Low Visibility Challenges

  • Xiaoxiao Li
  • Zhengxiong Li
  • Taobo Hu
  • Mengping Long
  • Xiao Ma
  • Jin Huang
  • Yiqiang Liu
  • Yaxiaer Yalikun

Matching whole slide histopathology images to provide comprehensive information on homologous tissues is beneficial for cancer diagnosis. However, the challenge arises with the Giga-pixel whole slide images (WSIs) when aiming for high-accuracy matching. Learning-based methods are difficult to generalize well with large-size WSIs, necessitating the integration of traditional matching methods to enhance accuracy as the size increases. In this paper, we propose a multi-size guiding matching method applicable high-accuracy requirements. Specifically, we design learning multiscale texture to train deep descriptors, called TDescNet, that trains 64 × 64 × 256 and 256 × 256 × 128 size convolution layer as C64 and C256 descriptors to overcome staining variation and low visibility challenges. Furthermore, we develop the 3D-ring descriptor using sparse keypoints to support the description of large-size WSIs. Finally, we employ C64, C256, and 3D-ring descriptors to progressively guide refined local matching, utilizing geometric consistency to identify correct matching results. Experiments show that when matching WSIs of size 4096 × 4096 pixels, our average matching error is 123. 48 μm and the success rate is 93. 02 $\%$ in 43 cases. Notably, our method achieves an average improvement of 65. 52 μm in matching accuracy compared to recent state-of-the-art methods, with enhancements ranging from 36. 27 μm to 131. 66 μm. Therefore, we achieve high-fidelity whole-slice image matching, and overcome staining variation and low visibility challenges, enabling assistance in comprehensive cancer diagnosis through matched WSIs.

ICML Conference 2024 Conference Paper

Overcoming Data and Model heterogeneities in Decentralized Federated Learning via Synthetic Anchors

  • Chun-Yin Huang
  • Kartik Srinivas
  • Xin Zhang 0054
  • Xiaoxiao Li

Conventional Federated Learning (FL) involves collaborative training of a global model while maintaining user data privacy. One of its branches, decentralized FL, is a serverless network that allows clients to own and optimize different local models separately, which results in saving management and communication resources. Despite the promising advancements in decentralized FL, it may reduce model generalizability due to lacking a global model. In this scenario, managing data and model heterogeneity among clients becomes a crucial problem, which poses a unique challenge that must be overcome: How can every client’s local model learn generalizable representation in a decentralized manner? To address this challenge, we propose a novel De centralized FL technique by introducing S ynthetic A nchors, dubbed as DeSA. Based on the theory of domain adaptation and Knowledge Distillation (KD), we theoretically and empirically show that synthesizing global anchors based on raw data distribution facilitates mutual knowledge transfer. We further design two effective regularization terms for local training: 1) REG loss that regularizes the distribution of the client’s latent embedding with the anchors and 2) KD loss that enables clients to learn from others. Through extensive experiments on diverse client data distributions, we showcase the effectiveness of DeSA in enhancing both inter- and intra-domain accuracy of each client. The implementation of DeSA can be found at: https: //github. com/ubc-tea/DESA

IJCAI Conference 2024 Conference Paper

Sample Quality Heterogeneity-aware Federated Causal Discovery through Adaptive Variable Space Selection

  • Xianjie Guo
  • Kui Yu
  • Hao Wang
  • Lizhen Cui
  • Han Yu
  • Xiaoxiao Li

Federated causal discovery (FCD) aims to uncover causal relationships among variables from decentralized data across multiple clients, while preserving data privacy. In practice, the sample quality of each client's local data may vary across different variable spaces, referred to as sample quality heterogeneity. Thus, data from different clients might be suitable for learning different causal relationships among variables. Model aggregated under existing FCD methods requires the entire model parameters from each client, thereby being unable to handle the sample quality heterogeneity issue. In this paper, we propose the Federated Adaptive Causal Discovery (FedACD) method to bridge this gap. During federated model aggregation, it adaptively selects the causal relationships learned under the "good" variable space (i. e. , one with high-quality samples) from each client, while masking those learned under the "bad" variable space (i. e. , one with low-quality samples). This way, each client only needs to send the optimal learning results to the server, achieving accurate FCD. Extensive experiments on various types of datasets demonstrate significant advantages of FedACD over existing methods. The source code is available at https: //github. com/Xianjie-Guo/FedACD.

JBHI Journal 2023 Journal Article

Dynamic Corrected Split Federated Learning With Homomorphic Encryption for U-Shaped Medical Image Networks

  • Ziyuan Yang
  • Yingyu Chen
  • Huijie Huangfu
  • Maosong Ran
  • Hui Wang
  • Xiaoxiao Li
  • Yi Zhang

U-shaped networks have become prevalent in various medical image tasks such as segmentation, and restoration. However, most existing U-shaped networks rely on centralized learning which raises privacy concerns. To address these issues, federated learning (FL) and split learning (SL) have been proposed. However, achieving a balance between the local computational cost, model privacy, and parallel training remains a challenge. In this articler, we propose a novel hybrid learning paradigm called D ynamic Corrected S plit F ederated L earning ( DC-SFL ) for U-shaped medical image networks. To preserve data privacy, including the input, model parameters, label and output simultaneously, we propose to split the network into three parts hosted by different parties. We propose a D ynamic W eight C orrection S trategy ( DWCS ) to stabilize the training process and avoid the model drift problem due to data heterogeneity. To further enhance privacy protection and establish a trustworthy distributed learning paradigm, we propose to introduce additively homomorphic encryption into the aggregation process of client-side model, which helps prevent potential collusion between parties and provides a better privacy guarantee for our proposed method. The proposed DC-SFL is evaluated on various medical image tasks, and the experimental results demonstrate its effectiveness. In comparison with state-of-the-art distributed learning methods, our method achieves competitive performance.

ICML Conference 2023 Conference Paper

Federated Adversarial Learning: A Framework with Convergence Analysis

  • Xiaoxiao Li
  • Zhao Song 0002
  • Jiaming Yang

Federated learning (FL) is a trending training paradigm to utilize decentralized training data. FL allows clients to update model parameters locally for several epochs, then share them to a global model for aggregation. This training paradigm with multi-local step updating before aggregation exposes unique vulnerabilities to adversarial attacks. Adversarial training is a popular and effective method to improve the robustness of networks against adversaries. In this work, we formulate a general form of federated adversarial learning (FAL) that is adapted from adversarial learning in the centralized setting. On the client side of FL training, FAL has an inner loop to generate adversarial samples for adversarial training and an outer loop to update local model parameters. On the server side, FAL aggregates local model updates and broadcast the aggregated model. We design a global robust training loss and formulate FAL training as a min-max optimization problem. Unlike the convergence analysis in classical centralized training that relies on the gradient direction, it is significantly harder to analyze the convergence in FAL for three reasons: 1) the complexity of min-max optimization, 2) model not updating in the gradient direction due to the multi-local updates on the client-side before aggregation and 3) inter-client heterogeneity. We address these challenges by using appropriate gradient approximation and coupling techniques and present the convergence analysis in the over-parameterized regime. Our main result theoretically shows that the minimum loss under our algorithm can converge to $\epsilon$ small with chosen learning rate and communication rounds. It is noteworthy that our analysis is feasible for non-IID clients.

ICLR Conference 2023 Conference Paper

PerFedMask: Personalized Federated Learning with Optimized Masking Vectors

  • Mehdi Setayesh
  • Xiaoxiao Li
  • Vincent W. S. Wong 0001

Recently, various personalized federated learning (FL) algorithms have been proposed to tackle data heterogeneity. To mitigate device heterogeneity, a common approach is to use masking. In this paper, we first show that using random masking can lead to a bias in the obtained solution of the learning model. To this end, we propose a personalized FL algorithm with optimized masking vectors called PerFedMask. In particular, PerFedMask facilitates each device to obtain its optimized masking vector based on its computational capability before training. Fine-tuning is performed after training. PerFedMask is a generalization of a recently proposed personalized FL algorithm, FedBABU (Oh et al., 2022). PerFedMask can be combined with other FL algorithms including HeteroFL (Diao et al., 2021) and Split-Mix FL (Hong et al., 2022). Results based on CIFAR-10 and CIFAR-100 datasets show that the proposed PerFedMask algorithm provides a higher test accuracy after fine-tuning and lower average number of trainable parameters when compared with six existing state-of-the-art FL algorithms in the literature. The codes are available at https://github.com/MehdiSet/PerFedMask.

AAAI Conference 2022 Conference Paper

Evaluating Explainable AI on a Multi-Modal Medical Imaging Task: Can Existing Algorithms Fulfill Clinical Requirements?

  • Weina Jin
  • Xiaoxiao Li
  • Ghassan Hamarneh

Being able to explain the prediction to clinical end-users is a necessity to leverage the power of artificial intelligence (AI) models for clinical decision support. For medical images, a feature attribution map, or heatmap, is the most common form of explanation that highlights important features for AI models’ prediction. However, it is unknown how well heatmaps perform on explaining decisions on multi-modal medical images, where each image modality or channel visualizes distinct clinical information of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users’ interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the modality-specific feature importance (MSFI) metric. It encodes clinical image and explanation interpretation patterns of modality prioritization and modality-specific feature localization. We conduct a clinical requirement-grounded, systematic evaluation using computational methods and a clinician user study. Results show that the examined 16 heatmap algorithms failed to fulfill clinical requirements to correctly indicate AI model decision process or decision quality. The evaluation and MSFI metric can guide the design and selection of explainable AI algorithms to meet clinical requirements on multi-modal explanation.

ICLR Conference 2022 Conference Paper

Federated Learning from Only Unlabeled Data with Class-conditional-sharing Clients

  • Nan Lu 0001
  • Zhao Wang 0006
  • Xiaoxiao Li
  • Gang Niu 0001
  • Qi Dou 0001
  • Masashi Sugiyama

Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. However, potential clients might even be reluctant to label their own data, which could limit the applicability of FL in practice. In this paper, we show the possibility of unsupervised FL whose model is still a classifier for predicting class labels, if the class-prior probabilities are shifted while the class-conditional distributions are shared among the unlabeled data owned by the clients. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model. FedUL is a very general solution to unsupervised FL: it is compatible with many supervised FL methods, and the recovery of the wanted model can be theoretically guaranteed as if the data have been labeled. Experiments on benchmark and real-world datasets demonstrate the effectiveness of FedUL. Code is available at https://github.com/lunanbit/FedUL.

ICLR Conference 2021 Conference Paper

FedBN: Federated Learning on Non-IID Features via Local Batch Normalization

  • Xiaoxiao Li
  • Meirui Jiang
  • Xiaofei Zhang
  • Michael Kamp
  • Qi Dou 0001

The emerging paradigm of federated learning (FL) strives to enable collaborative training of deep models on the network edge without centrally aggregating raw data and hence improving data privacy. In most cases, the assumption of independent and identically distributed samples across local clients does not hold for federated learning setups. Under this setting, neural network training performance may vary significantly according to the data distribution and even hurt training convergence. Most of the previous work has focused on a difference in the distribution of labels or client shifts. Unlike those settings, we address an important problem of FL, e.g., different scanners/sensors in medical imaging, different scenery distribution in autonomous driving (highway vs. city), where local clients store examples with different distributions compared to other clients, which we denote as feature shift non-iid. In this work, we propose an effective method that uses local batch normalization to alleviate the feature shift before averaging models. The resulting scheme, called FedBN, outperforms both classical FedAvg, as well as the state-of-the-art for non-iid data (FedProx) on our extensive experiments. These empirical results are supported by a convergence analysis that shows in a simplified setting that FedBN has a faster convergence rate than FedAvg. Code is available at https://github.com/med-air/FedBN.

ICML Conference 2021 Conference Paper

FL-NTK: A Neural Tangent Kernel-based Framework for Federated Learning Analysis

  • Baihe Huang
  • Xiaoxiao Li
  • Zhao Song 0002
  • Xin Yang 0017

Federated Learning (FL) is an emerging learning scheme that allows different distributed clients to train deep neural networks together without data sharing. Neural networks have become popular due to their unprecedented success. To the best of our knowledge, the theoretical guarantees of FL concerning neural networks with explicit forms and multi-step updates are unexplored. Nevertheless, training analysis of neural networks in FL is non-trivial for two reasons: first, the objective loss function we are optimizing is non-smooth and non-convex, and second, we are even not updating in the gradient direction. Existing convergence results for gradient descent-based methods heavily rely on the fact that the gradient direction is used for updating. The current paper presents a new class of convergence analysis for FL, Federated Neural Tangent Kernel (FL-NTK), which corresponds to overparamterized ReLU neural networks trained by gradient descent in FL and is inspired by the analysis in Neural Tangent Kernel (NTK). Theoretically, FL-NTK converges to a global-optimal solution at a linear rate with properly tuned learning parameters. Furthermore, with proper distributional assumptions, FL-NTK can also achieve good generalization. The proposed theoretical analysis scheme can be generalized to more complex neural networks.

ICLR Conference 2021 Conference Paper

On InstaHide, Phase Retrieval, and Sparse Matrix Factorization

  • Sitan Chen
  • Xiaoxiao Li
  • Zhao Song 0002
  • Danyang Zhuo

In this work, we examine the security of InstaHide, a scheme recently proposed by \cite{hsla20} for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the distributed learners, InstaHide takes a convex combination of private feature vectors and randomly flips the sign of each entry of the resulting vector with probability 1/2. A salient question is whether this scheme is secure in any provable sense, perhaps under a plausible complexity-theoretic assumption. The answer to this turns out to be quite subtle and closely related to the average-case complexity of a multi-task, missing-data version of the classic problem of phase retrieval that is interesting in its own right. Motivated by this connection, under the standard distributional assumption that the public/private feature vectors are isotropic Gaussian, we design an algorithm that can actually recover a private vector using only the public vectors and a sequence of synthetic vectors generated by InstaHide.

NeurIPS Conference 2021 Conference Paper

Subgraph Federated Learning with Missing Neighbor Generation

  • Ke Zhang
  • Carl Yang
  • Xiaoxiao Li
  • Lichao Sun
  • Siu Ming Yiu

Graphs have been widely used in data mining and machine learning due to their unique representation of real-world objects and their interactions. As graphs are getting bigger and bigger nowadays, it is common to see their subgraphs separately collected and stored in multiple local systems. Therefore, it is natural to consider the subgraph federated learning setting, where each local system holds a small subgraph that may be biased from the distribution of the whole graph. Hence, the subgraph federated learning aims to collaboratively train a powerful and generalizable graph mining model without directly sharing their graph data. In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage+, which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. Empirical results on four real-world graph datasets with synthesized subgraph federated learning settings demonstrate the effectiveness and efficiency of our proposed techniques. At the same time, consistent theoretical implications are made towards their generalization ability on the global graphs.

ICML Conference 2020 Conference Paper

Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE

  • Juntang Zhuang
  • Nicha C. Dvornek
  • Xiaoxiao Li
  • Sekhar Tatikonda
  • Xenophon Papademetris
  • James S. Duncan

The empirical performance of neural ordinary differential equations (NODEs) is significantly inferior to discrete-layer models on benchmark tasks (e. g. image classification). We demonstrate an explanation is the inaccuracy of existing gradient estimation methods: the adjoint method has numerical errors in reverse-mode integration; the naive method suffers from a redundantly deep computation graph. We propose the Adaptive Checkpoint Adjoint (ACA) method: ACA applies a trajectory checkpoint strategy which records the forward- mode trajectory as the reverse-mode trajectory to guarantee accuracy; ACA deletes redundant components for shallow computation graphs; and ACA supports adaptive solvers. On image classification tasks, compared with the adjoint and naive method, ACA achieves half the error rate in half the training time; NODE trained with ACA outperforms ResNet in both accuracy and test-retest reliability. On time-series modeling, ACA outperforms competing methods. Furthermore, NODE with ACA can incorporate physical knowledge to achieve better accuracy.