Author name cluster

Mohammad Mahmoody

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

EWRL Workshop 2025 Workshop Paper

Targeted Poisoning of Reinforcement Learning Agents

Omran Shahbazi Gholiabad
Mohammad Mahmoody

Understanding the impact of adversarial attacks on reinforcement learning (RL) models is essential due to their wide range of applications. In this work, we initiate a study of targeted poisoning attacks for reinforcement learning agents, where the adversary aims to deliberately increase the likelihood of a specific undesirable event chosen by the attacker. In particular, rather than degrading overall performance indiscriminately, the adversary carefully manipulates the training process so that, during critical decision-making steps, the agent is more likely to fail in a targeted manner, leading it into the adversary’s desired outcome. We present theoretical results showing the effectiveness of such targeted poison- ing in basic RL settings. Building on these insights, we design practical attack strategies and thoroughly evaluate their impact beyond the scope of our theoretical analysis. Through extensive experiments, we demonstrate that targeted poisoning attacks substantially raise the probability of the chosen undesirable event across a variety of reinforcement learning tasks, ranging from classic control benchmarks to more complex continuous-control environments, including stochastic settings. We compare our attacks against standard RL baselines and against algorithms specifi- cally designed to mitigate poisoning, and we further validate their effectiveness on deep RL models. Our results highlight the vulnerabilities of RL systems to targeted training-time manipulations, underscoring the need for stronger defenses.

PDF

NeurIPS Conference 2022 Conference Paper

On Optimal Learning Under Targeted Data Poisoning

Steve Hanneke
Amin Karbasi
Mohammad Mahmoody
Idan Mehalel
Shay Moran

Consider the task of learning a hypothesis class $\mathcal{H}$ in the presence of an adversary that can replace up to an $\eta$ fraction of the examples in the training set with arbitrary adversarial examples. The adversary aims to fail the learner on a particular target test point $x$ which is \emph{known} to the adversary but not to the learner. In this work we aim to characterize the smallest achievable error $\epsilon=\epsilon(\eta)$ by the learner in the presence of such an adversary in both realizable and agnostic settings. We fully achieve this in the realizable setting, proving that $\epsilon=\Theta(\mathtt{VC}(\mathcal{H})\cdot \eta)$, where $\mathtt{VC}(\mathcal{H})$ is the VC dimension of $\mathcal{H}$. Remarkably, we show that the upper bound can be attained by a deterministic learner. In the agnostic setting we reveal a more elaborate landscape: we devise a deterministic learner with a multiplicative regret guarantee of $\epsilon \leq C\cdot\mathtt{OPT} + O(\mathtt{VC}(\mathcal{H})\cdot \eta)$, where $C > 1$ is a universal numerical constant. We complement this by showing that for any deterministic learner there is an attack which worsens its error to at least $2\cdot \mathtt{OPT}$. This implies that a multiplicative deterioration in the regret is unavoidable in this case. Finally, the algorithms we develop for achieving the optimal rates are inherently improper. Nevertheless, we show that for a variety of natural concept classes, such as linear classifiers, it is possible to retain the dependence $\epsilon=\Theta_{\mathcal{H}}(\eta)$ by a proper algorithm in the realizable setting. Here $\Theta_{\mathcal{H}}$ conceals a polynomial dependence on $\mathtt{VC}(\mathcal{H})$.

PDF Details

NeurIPS Conference 2022 Conference Paper

Overparameterization from Computational Constraints

Sanjam Garg
Somesh Jha
Saeed Mahloujifar
Mohammad Mahmoody
Mingyuan Wang

Overparameterized models with millions of parameters have been hugely successful. In this work, we ask: can the need for large models be, at least in part, due to the \emph{computational} limitations of the learner? Additionally, we ask, is this situation exacerbated for \emph{robust} learning? We show that this indeed could be the case. We show learning tasks for which computationally bounded learners need \emph{significantly more} model parameters than what information-theoretic learners need. Furthermore, we show that even more model parameters could be necessary for robust learning. In particular, for computationally bounded learners, we extend the recent result of Bubeck and Sellke [NeurIPS'2021] which shows that robust models might need more parameters, to the computational regime and show that bounded learners could provably need an even larger number of parameters. Then, we address the following related question: can we hope to remedy the situation for robust computationally bounded learning by restricting \emph{adversaries} to also be computationally bounded for sake of obtaining models with fewer parameters? Here again, we show that this could be possible. Specifically, building on the work of Garg, Jha, Mahloujifar, and Mahmoody [ALT'2020], we demonstrate a learning task that can be learned efficiently and robustly against a computationally bounded attacker, while to be robust against an information-theoretic attacker requires the learner to utilize significantly more parameters.

PDF Details

NeurIPS Conference 2021 Conference Paper

A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks

Samuel Deng
Sanjam Garg
Somesh Jha
Saeed Mahloujifar
Mohammad Mahmoody
Abhradeep Guha Thakurta

Poisoning attacks have emerged as a significant security threat to machine learning algorithms. It has been demonstrated that adversaries who make small changes to the training set, such as adding specially crafted data points, can hurt the performance of the output model. Most of these attacks require the full knowledge of training data. This leaves open the possibility of achieving the same attack results using poisoning attacks that do not have the full knowledge of the clean training set. In this work, we initiate a theoretical study of the problem above. Specifically, for the case of feature selection with LASSO, we show that \emph{full information} adversaries (that craft poisoning examples based on the rest of the training data) are provably much more devastating compared to the optimal attacker that is \emph{oblivious} to the training set yet has access to the distribution of the data. Our separation result shows that the two settings of data-aware and data-oblivious are fundamentally different and we cannot hope to achieve the same attack or defense results in these scenarios.

PDF Details

UAI Conference 2021 Conference Paper

Learning and certification under instance-targeted poisoning

Ji Gao
Amin Karbasi
Mohammad Mahmoody

In this paper, we study PAC learnability and certification under instance-targeted poisoning attacks, where the adversary may change a fraction of the training set with the goal of fooling the learner at a specific target instance. Our first contribution is to formalize the problem in various settings, and explicitly discussing subtle aspects such as learner’s randomness and whether (or not) adversary’s attack can depend on it. We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable. In contrast, when the adversary’s budget grows linearly with the sample complexity, the adversary can potentially drive up the expected 0-1 loss to one. We also study distribution-specific PAC learning in the same attack model and show that proper learning with certification is possible for learning half spaces under natural distributions. Finally, we empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets against targeted-poisoning attacks. Our experimental results show that many models, especially state-of-the-art neural networks, are indeed vulnerable to these strong attacks. Interestingly, we observe that methods with high standard accuracy might be more vulnerable to instance-targeted poisoning attacks.

Details

SODA Conference 2020 Conference Paper

Computational Concentration of Measure: Optimal Bounds, Reductions, and More

Omid Etesami
Saeed Mahloujifar
Mohammad Mahmoody

Details

ICML Conference 2019 Conference Paper

Data Poisoning Attacks in Multi-Party Learning

Saeed Mahloujifar
Mohammad Mahmoody
Ameer Mohammed

In this work, we demonstrate universal multi-party poisoning attacks that adapt and apply to any multi-party learning process with arbitrary interaction pattern between the parties. More generally, we introduce and study $(k, p)$-poisoning attacks in which an adversary controls $k\in[m]$ of the parties, and for each corrupted party $P_i$, the adversary submits some poisoned data $T’_i$ on behalf of $P_i$ that is still "$(1-p)$-close" to the correct data $T_i$ (e. g. , $1-p$ fraction of $T’_i$ is still honestly generated). We prove that for any "bad" property $B$ of the final trained hypothesis $h$ (e. g. , $h$ failing on a particular test example or having "large" risk) that has an arbitrarily small constant probability of happening without the attack, there always is a $(k, p)$-poisoning attack that increases the probability of $B$ from $\mu$ to by $\mu^{1-p \cdot k/m} = \mu + \Omega(p \cdot k/m)$. Our attack only uses clean labels, and it is online, as it only knows the the data shared so far.

Details

NeurIPS Conference 2019 Conference Paper

Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness

Saeed Mahloujifar
Xiao Zhang
Mohammad Mahmoody
David Evans

Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018b), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e. g. ,1/100) measure, according to the imposed distribution, has small distance to almost all (e. g. , 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration. We use it to empirically estimate the intrinsic robustness to and L 2 and L infinity perturbations of several image classification benchmarks. Code for our experiments is available at https: //github. com/xiaozhanguva/Measure-Concentration.

PDF Details

AAAI Conference 2019 Conference Paper

The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure

Saeed Mahloujifar
Dimitrios I. Diochnos
Mohammad Mahmoody

Many modern machine learning classifiers are shown to be vulnerable to adversarial perturbations of the instances. Despite a massive amount of work focusing on making classifiers robust, the task seems quite challenging. In this work, through a theoretical study, we investigate the adversarial risk and robustness of classifiers and draw a connection to the well-known phenomenon of “concentration of measure” in metric measure spaces. We show that if the metric probability space of the test instance is concentrated, any classifier with some initial constant error is inherently vulnerable to adversarial perturbations. One class of concentrated metric probability spaces are the so-called Lévy families that include many natural distributions. In this special case, our attacks only need to perturb the test instance by at most O( √ n) to make it misclassified, where n is the data dimension. Using our general result about Lévy instance spaces, we first recover as special case some of the previously proved results about the existence of adversarial examples. However, many more Lévy families are known (e. g. , product distribution under the Hamming distance) for which we immediately obtain new attacks that find adversarial examples of distance O( √ n). Finally, we show that concentration of measure for product spaces implies the existence of forms of “poisoning” attacks in which the adversary tampers with the training data with the goal of degrading the classifier. In particular, we show that for any learning algorithm that uses m training examples, there is an adversary who can increase the probability of any “bad property” (e. g. , failing on a particular test instance) that initially happens with non-negligible probability to ≈ 1 by substituting only e O( √ m) of the examples with other (still correctly labeled) examples.

PDF Details

NeurIPS Conference 2018 Conference Paper

Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution

Dimitrios Diochnos
Saeed Mahloujifar
Mohammad Mahmoody

We study adversarial perturbations when the instances are uniformly distributed over {0, 1}^n. We study both "inherent" bounds that apply to any problem and any classifier for such a problem as well as bounds that apply to specific problems and specific hypothesis classes. As the current literature contains multiple definitions of adversarial risk and robustness, we start by giving a taxonomy for these definitions based on their direct goals; we identify one of them as the one guaranteeing misclassification by pushing the instances to the error region. We then study some classic algorithms for learning monotone conjunctions and compare their adversarial risk and robustness under different definitions by attacking the hypotheses using instances drawn from the uniform distribution. We observe that sometimes these definitions lead to significantly different bounds. Thus, this study advocates for the use of the error-region definition, even though other definitions, in other contexts with context-dependent assumptions, may coincide with the error-region definition. Using the error-region definition of adversarial perturbations, we then study inherent bounds on risk and robustness of any classifier for any classification problem whose instances are uniformly distributed over {0, 1}^n. Using the isoperimetric inequality for the Boolean hypercube, we show that for initial error 0. 01, there always exists an adversarial perturbation that changes O(√n) bits of the instances to increase the risk to 0. 5, making classifier's decisions meaningless. Furthermore, by also using the central limit theorem we show that when n→∞, at most c√n bits of perturbations, for a universal constant c<1. 17, suffice for increasing the risk to 0. 5, and the same c√n bits of perturbations on average suffice to increase the risk to 1, hence bounding the robustness by c√n.

PDF Details

FOCS Conference 2007 Conference Paper

Lower Bounds on Signatures From Symmetric Primitives

Boaz Barak
Mohammad Mahmoody

We show that every black-box construction of one-time signature schemes from a random oracle achieves security at most poly(q)2 q. where q is the total number of queries to the oracle by the generation, signing, and verification algorithms. That is, any such scheme can be broken with probability close to 1 by a (computationally unbounded) adversary making poly(q)2 q queries to the oracle. This is tight up to a constant factor in the number of queries, since a simple modification of Lamport's scheme achieves 2 (0. 812-o(1))q security using q queries. Our results extend (with a loss of a constant factor in the number of queries) also to the random permutation and ideal-cipher oracles, and so can be taken as evidence of an inherent efficiency gap between signature schemes and symmetric primitives such as block ciphers, hash functions, and message authentication codes.

Details