Author name cluster

Scott Alfeld

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers

2 author rows

ECAI Conference 2023 Conference Paper

Approximate Data Deletion in Generative Models

Zhifeng Kong
Scott Alfeld

Users have the right to have their data deleted by third-party learned systems, as codified by recent legislation such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Such data deletion can be achieved by full re-training, but this incurs a high computational cost for modern machine learning methods. To avoid this cost, many approximate deletion methods have been developed for supervised learning. Unsupervised learning, in contrast, remains largely an open problem when it comes to efficient approximate data deletion. In this paper, we introduce (1) an efficient method for approximate deletion in generative models, and (2) statistical tests for estimating whether training points have been deleted. We provide theoretical guarantees under various learner assumptions. We then empirically demonstrate our methods across a variety of generative methods.

Details

AAAI Conference 2023 Conference Paper

Training-Time Attacks against K-nearest Neighbors

Ara Vartanian
Will Rosenbaum
Scott Alfeld

Nearest neighbor-based methods are commonly used for classification tasks and as subroutines of other data-analysis methods. An attacker with the capability of inserting their own data points into the training set can manipulate the inferred nearest neighbor structure. We distill this goal to the task of performing a training-set data insertion attack against k-Nearest Neighbor classification (kNN). We prove that computing an optimal training-time (a.k.a. poisoning) attack against kNN classification is NP-Hard, even when k = 1 and the attacker can insert only a single data point. We provide an anytime algorithm to perform such an attack, and a greedy algorithm for general k and attacker budget. We provide theoretical bounds and empirically demonstrate the effectiveness and practicality of our methods on synthetic and real-world datasets. Empirically, we find that kNN is vulnerable in practice and that dimensionality reduction is an effective defense. We conclude with a discussion of open problems illuminated by our analysis.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Hard to Forget: Poisoning Attacks on Certified Machine Unlearning

Neil G. Marchant
Benjamin I. P. Rubinstein
Scott Alfeld

The right to erasure requires removal of a user’s information from data held by organizations, with rigorous interpretations extending to downstream products such as learned models. Retraining from scratch with the particular user’s data omitted fully removes its influence on the resulting model, but comes with a high computational cost. Machine “unlearning” mitigates the cost incurred by full retraining: instead, models are updated incrementally, possibly only requiring retraining when approximation errors accumulate. Rapid progress has been made towards privacy guarantees on the indistinguishability of unlearned and retrained models, but current formalisms do not place practical bounds on computation. In this paper we demonstrate how an attacker can exploit this oversight, highlighting a novel attack surface introduced by machine unlearning. We consider an attacker aiming to increase the computational cost of data removal. We derive and empirically investigate a poisoning attack on certified machine unlearning where strategically designed training data triggers complete retraining when removed.

PDF Details

AAAI Conference 2019 Conference Paper

Attacking Data Transforming Learners at Training Time

Scott Alfeld
Ara Vartanian
Lucas Newman-Johnson
Benjamin I.P. Rubinstein

While machine learning systems are known to be vulnerable to data-manipulation attacks at both training and deployment time, little is known about how to adapt attacks when the defender transforms data prior to model estimation. We consider the setting where the defender Bob first transforms the data then learns a model from the result; Alice, the attacker, perturbs Bob’s input data prior to him transforming it. We develop a general-purpose “plug and play” framework for gradient-based attacks based on matrix differentials, focusing on ordinary least-squares linear regression. This allows learning algorithms and data transformations to be paired and composed arbitrarily: attacks can be adapted through the use of the chain rule—analogous to backpropagation on neural network parameters—to compositional learning maps. Bestresponse attacks can be computed through matrix multiplications from a library of attack matrices for transformations and learners. Our treatment of linear regression extends state-ofthe-art attacks at training time, by permitting the attacker to affect both features and targets optimally and simultaneously. We explore several transformations broadly used across machine learning with a driving motivation for our work being autogressive modeling. There, Bob transforms a univariate time series into a matrix of observations and vector of target values which can then be fed into standard learners. Under this learning reduction, a perturbation from Alice to a single value of the time series affects features of several data points along with target values.

PDF Details

AAMAS Conference 2018 Conference Paper

Adversarial Classification on Social Networks

Sixie Yu
Yevgeniy Vorobeychik
Scott Alfeld

The spread of unwanted or malicious content through social media has become a major challenge. Traditional examples of this include social network spam, but an important new concern is the propagation of fake news through social media. A common approach for mitigating this problem is by using standard statistical classification to distinguish malicious (e. g. , fake news) instances from benign (e. g. , actual news stories). However, such an approach ignores the fact that malicious instances propagate through the network, which is consequential both in quantifying consequences (e. g. , fake news diffusing through the network), and capturing detection redundancy (bad content can be detected at different nodes). An additional concern is evasion attacks, whereby the generators of malicious instances modify the nature of these to escape detection. We model this problem as a Stackelberg game between the defender who is choosing parameters of the detection model, and an attacker, who is choosing both the node at which to initiate malicious spread, and the nature of malicious entities. We develop a novel bi-level programming approach for this problem, as well as a novel solution approach based on implicit function gradients, and experimentally demonstrate the advantage of our approach over alternatives which ignore network structure.

PDF

ICML Conference 2018 Conference Paper

Adversarial Regression with Multiple Learners

Liang Tong
Sixie Yu
Scott Alfeld
Yevgeniy Vorobeychik

Despite the considerable success enjoyed by machine learning techniques in practice, numerous studies demonstrated that many approaches are vulnerable to attacks. An important class of such attacks involves adversaries changing features at test time to cause incorrect predictions. Previous investigations of this problem pit a single learner against an adversary. However, in many situations an adversary’s decision is aimed at a collection of learners, rather than specifically targeted at each independently. We study the problem of adversarial linear regression with multiple learners. We approximate the resulting game by exhibiting an upper bound on learner loss functions, and show that the resulting game has a unique symmetric equilibrium. We present an algorithm for computing this equilibrium, and show through extensive experiments that equilibrium models are significantly more robust than conventional regularized linear regression.

Details

AAAI Conference 2017 Conference Paper

Explicit Defense Actions Against Test-Set Attacks

Scott Alfeld
Xiaojin Zhu
Paul Barford

Automated learning and decision making systems in publicfacing applications are vulnerable to malicious attacks. Examples of such systems include spam detectors, credit card fraud detectors, and network intrusion detection systems. These systems are at further risk of attack when money is directly involved, such as market forecasters or decision systems used in determining insurance or loan rates. In this paper, we consider the setting where a predictor Bob has a ﬁxed model, and an unknown attacker Alice aims to perturb (or poison) future test instances so as to alter Bob’s prediction to her beneﬁt. We focus speciﬁcally on Bob’s optimal defense actions to limit Alice’s effectiveness. We deﬁne a general framework for determining Bob’s optimal defense action against Alice’s worst-case attack. We then demonstrate our framework by considering linear predictors, where we provide tractable methods of determining the optimal defense action. Using these methods, we perform an empirical investigation of optimal defense actions for a particular class of linear models — autoregressive forecasters — and ﬁnd that for ten real world futures markets, the optimal defense action reduces the Bob’s loss by between 78 and 97%.

PDF Details

AAAI Conference 2016 Conference Paper

Data Poisoning Attacks against Autoregressive Models

Scott Alfeld
Xiaojin Zhu
Paul Barford

Forecasting models play a key role in money-making ventures in many different markets. Such models are often trained on data from various sources, some of which may be untrustworthy. An actor in a given market may be incentivised to drive predictions in a certain direction to their own beneﬁt. Prior analyses of intelligent adversaries in a machinelearning context have focused on regression and classiﬁcation. In this paper we address the non-iid setting of time series forecasting. We consider a forecaster, Bob, using a ﬁxed, known model and a recursive forecasting method. An adversary, Alice, aims to pull Bob’s forecasts toward her desired target series, and may exercise limited inﬂuence on the initial values fed into Bob’s model. We consider the class of linear autoregressive models, and a ﬂexible framework of encoding Alice’s desires and constraints. We describe a method of calculating Alice’s optimal attack that is computationally tractable, and empirically demonstrate its effectiveness compared to random and greedy baselines on synthetic and realworld time series data. We conclude by discussing defensive strategies in the face of Alice-like adversaries.

PDF Details

SoCS Conference 2016 Conference Paper

Machine Teaching as Search

Scott Alfeld
Xiaojin Zhu 0001
Paul Barford

Machine teaching (MT) studies the task of designing a training set. Specifically, given a learner (e. g. , an artificial neural network or a human) and a target model, a teacher aims to create a training set which results in the target model being learned. MT applications include optimal education design for human learners and computer security where adversaries aim to attack learning-based systems. In this work, we formulate pool-based MT as a state space search problem. We discuss the properties and challenges of the resulting problem and highlight opportunities for novel search techniques. In our preliminary study we use a beam search approach, and find that training and evaluating empirical risk of models dominate the run time of the search. Toward the goal of better search techniques for future work, we develop optimizations ranging from implementation details for specific learners to algorithm changes applicable to general blackbox learners. We conclude with a discussion of open problems and research directions.

Details