Author name cluster

Nirupam Gupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

8 papers

2 author rows

ICLR Conference 2025 Conference Paper

Adaptive Gradient Clipping for Robust Federated Learning

Youssef Allouah
Rachid Guerraoui
Nirupam Gupta
Ahmed Jellouli
Geovani Rizk
John Stephan

Robust federated learning aims to maintain reliable performance despite the presence of adversarial or misbehaving workers. While state-of-the-art (SOTA) robust distributed gradient descent (Robust-DGD) methods were proven theoretically optimal, their empirical success has often relied on pre-aggregation gradient clipping. However, existing static clipping strategies yield inconsistent results: enhancing robustness against some attacks while being ineffective or even detrimental against others. To address this limitation, we propose a principled adaptive clipping strategy, Adaptive Robust Clipping (ARC), which dynamically adjusts clipping thresholds based on the input gradients. We prove that ARC not only preserves the theoretical robustness guarantees of SOTA Robust-DGD methods but also provably improves asymptotic convergence when the model is well-initialized. Extensive experiments on benchmark image classification tasks confirm these theoretical insights, demonstrating that ARC significantly enhances robustness, particularly in highly heterogeneous and adversarial settings.

Details

ICML Conference 2024 Conference Paper

Byzantine-Robust Federated Learning: Impact of Client Subsampling and Local Updates

Youssef Allouah
Sadegh Farhadkhani
Rachid Guerraoui
Nirupam Gupta
Rafael Pinot
Geovani Rizk
Sasha Voitovych

The possibility of adversarial (a. k. a. , Byzantine) clients makes federated learning (FL) prone to arbitrary manipulation. The natural approach to robustify FL against adversarial clients is to replace the simple averaging operation at the server in the standard $\mathsf{FedAvg}$ algorithm by a robust averaging rule. While a significant amount of work has been devoted to studying the convergence of federated robust averaging (which we denote by $\mathsf{FedRo}$), prior work has largely ignored the impact of client subsampling and local steps, two fundamental FL characteristics. While client subsampling increases the effective fraction of Byzantine clients, local steps increase the drift between the local updates computed by honest (i. e. , non-Byzantine) clients. Consequently, a careless deployment of $\mathsf{FedRo}$ could yield poor performance. We validate this observation by presenting an in-depth analysis of $\mathsf{FedRo}$ tightly analyzing the impact of client subsampling and local steps. Specifically, we present a sufficient condition on client subsampling for nearly-optimal convergence of $\mathsf{FedRo}$ (for smooth non-convex loss). Also, we show that the rate of improvement in learning accuracy diminishes with respect to the number of clients subsampled, as soon as the sample size exceeds a threshold value. Interestingly, we also observe that under a careful choice of step-sizes, the learning error due to Byzantine clients decreases with the number of local steps. We validate our theory by experiments on the FEMNIST and CIFAR-$10$ image classification tasks.

Details

NeurIPS Conference 2024 Conference Paper

Fine-Tuning Personalization in Federated Learning to Mitigate Adversarial Clients

Youssef Allouah
Abdellah El Mrini
Rachid Guerraoui
Nirupam Gupta
Rafael Pinot

Federated learning (FL) is an appealing paradigm that allows a group of machines(a. k. a. clients) to learn collectively while keeping their data local. However, dueto the heterogeneity between the clients’ data distributions, the model obtainedthrough the use of FL algorithms may perform poorly on some client’s data. Personalization addresses this issue by enabling each client to have a differentmodel tailored to their own data while simultaneously benefiting from the otherclients’ data. We consider an FL setting where some clients can be adversarial, andwe derive conditions under which full collaboration fails. Specifically, we analyzethe generalization performance of an interpolated personalized FL framework in thepresence of adversarial clients, and we precisely characterize situations when fullcollaboration performs strictly worse than fine-tuned personalization. Our analysisdetermines how much we should scale down the level of collaboration, accordingto data heterogeneity and the tolerable fraction of adversarial clients. We supportour findings with empirical results on mean estimation and binary classificationproblems, considering synthetic and benchmark image classification datasets

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Revisiting Ensembling in One-Shot Federated Learning

Youssef Allouah
Akash Dhasade
Rachid Guerraoui
Nirupam Gupta
Anne-Marie Kermarrec
Rafael Pinot
Rafael Pires
Rishi Sharma

Federated Learning (FL) is an appealing approach to training machine learning models without sharing raw data. However, standard FL algorithms are iterative and thus induce a significant communication cost. One-Shot FL (OFL) trades the iterative exchange of models between clients and the server with a single round of communication, thereby saving substantially on communication costs. Not surprisingly, OFL exhibits a performance gap in terms of accuracy with respect to FL, especially under high data heterogeneity. We introduce Fens, a novel federated ensembling scheme that approaches the accuracy of FL with the communication efficiency of OFL. Learning in Fens proceeds in two phases: first, clients train models locally and send them to the server, similar to OFL; second, clients collaboratively train a lightweight prediction aggregator model using FL. We showcase the effectiveness of Fens through exhaustive experiments spanning several datasets and heterogeneity levels. In the particular case of heterogeneously distributed CIFAR-10 dataset, Fens achieves up to a $26. 9$% higher accuracy over SOTA OFL, being only $3. 1$% lower than FL. At the same time, Fens incurs at most $4. 3\times$ more communication than OFL, whereas FL is at least $10. 9\times$ more communication-intensive than Fens.

PDF Details DOI

ICML Conference 2023 Conference Paper

On the Privacy-Robustness-Utility Trilemma in Distributed Learning

Youssef Allouah
Rachid Guerraoui
Nirupam Gupta
Rafael Pinot
John Stephan

The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines’ data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.

Details

ICML Conference 2023 Conference Paper

Robust Collaborative Learning with Linear Gradient Overhead

Sadegh Farhadkhani
Rachid Guerraoui
Nirupam Gupta
Lê-Nguyên Hoang
Rafael Pinot
John Stephan

Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak’s momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(\alpha, \lambda)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https: //github. com/LPD-EPFL/robust-collaborative-learning.

Details

NeurIPS Conference 2023 Conference Paper

Robust Distributed Learning: Tight Error Bounds and Breakdown Point under Data Heterogeneity

Youssef Allouah
Rachid Guerraoui
Nirupam Gupta
Rafael Pinot
Geovani Rizk

The theory underlying robust distributed learning algorithms, designed to resist adversarial machines, matches empirical observations when data is homogeneous. Under data heterogeneity however, which is the norm in practical scenarios, established lower bounds on the learning error are essentially vacuous and greatly mismatch empirical observations. This is because the heterogeneity model considered is too restrictive and does not cover basic learning tasks such as least-squares regression. We consider in this paper a more realistic heterogeneity model, namely $(G, B)$-gradient dissimilarity, and show that it covers a larger class of learning problems than existing theory. Notably, we show that the breakdown point under heterogeneity is lower than the classical fraction $\frac{1}{2}$. We also prove a new lower bound on the learning error of any distributed learning algorithm. We derive a matching upper bound for a robust variant of distributed gradient descent, and empirically show that our analysis reduces the gap between theory and practice.

PDF Details

ICML Conference 2022 Conference Paper

Byzantine Machine Learning Made Easy By Resilient Averaging of Momentums

Sadegh Farhadkhani
Rachid Guerraoui
Nirupam Gupta
Rafael Pinot
John Stephan

Byzantine resilience emerged as a prominent topic within the distributed machine learning community. Essentially, the goal is to enhance distributed optimization algorithms, such as distributed SGD, in a way that guarantees convergence despite the presence of some misbehaving (a. k. a. , Byzantine ) workers. Although a myriad of techniques addressing the problem have been proposed, the field arguably rests on fragile foundations. These techniques are hard to prove correct and rely on assumptions that are (a) quite unrealistic, i. e. , often violated in practice, and (b) heterogeneous, i. e. , making it difficult to compare approaches. We present RESAM (RESilient Averaging of Momentums), a unified framework that makes it simple to establish optimal Byzantine resilience, relying only on standard machine learning assumptions. Our framework is mainly composed of two operators: resilient averaging at the server and distributed momentum at the workers. We prove a general theorem stating the convergence of distributed SGD under RESAM. Interestingly, demonstrating and comparing the convergence of many existing techniques become direct corollaries of our theorem, without resorting to stringent assumptions. We also present an empirical evaluation of the practical relevance of RESAM.

Details