Author name cluster

Robert Stanforth

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

TMLR Journal 2025 Journal Article

Privacy Awareness for Information-Sharing Assistants: A Case-study on Form-filling with Contextual Integrity

Sahra Ghalebikesabi
Eugene Bagdasarian
Ren Yi
Itay Yona
Ilia Shumailov
Aneesh Pappu
Chongyang Shi
Laura Weidinger

Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-sharing assistants to behave in accordance with privacy expectations, we propose to operationalize the design of privacy-conscious assistants that conform with *contextual integrity* (CI), a framework that equates privacy with the appropriate flow of information in a given context. In particular, we design and evaluate a number of strategies to steer assistants' information-sharing actions to be CI compliant. Our evaluation is based on a novel form filling benchmark composed of human annotations of common webform applications, and it reveals that prompting frontier LLMs to perform CI-based reasoning yields strong results.

PDF Details

ICLR Conference 2024 Conference Paper

Expressive Losses for Verified Robustness via Convex Combinations

Alessandro De Palma
Rudy Bunel
Krishnamurthy Dvijotham
M. Pawan Kumar
Robert Stanforth
Alessio Lomuscio

In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.

Details

NeurIPS Conference 2021 Conference Paper

Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications

Leonard Berrada
Sumanth Dathathri
Krishnamurthy Dvijotham
Robert Stanforth
Rudy R. Bunel
Jonathan Uesato
Sven Gowal
M. Pawan Kumar

Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuring the reliability of neural networks, progress in the direction of probabilistic specifications has been limited. In this direction, we first introduce a general formulation of probabilistic specifications for neural networks, which captures both probabilistic networks (e. g. , Bayesian neural networks, MC-Dropout networks) and uncertain inputs (distributions over inputs arising from sensor noise or other perturbations). We then propose a general technique to verify such specifications by generalizing the notion of Lagrangian duality, replacing standard Lagrangian multipliers with "functional multipliers" that can be arbitrary functions of the activations at a given layer. We show that an optimal choice of functional multipliers leads to exact verification (i. e. , sound and complete verification), and for specific forms of multipliers, we develop tractable practical verification algorithms. We empirically validate our algorithms by applying them to Bayesian Neural Networks (BNNs) and MC Dropout Networks, and certifying properties such as adversarial robustness and robust detection of out-of-distribution (OOD) data. On these tasks we are able to provide significantly stronger guarantees when compared to prior work -- for instance, for a VGG-64 MC-Dropout CNN trained on CIFAR-10 in a verification-agnostic manner, we improve the certified AUC (a verified lower bound on the true AUC) for robust OOD detection (on CIFAR-100) from $0 \% \rightarrow 29\%$. Similarly, for a BNN trained on MNIST, we improve on the $\ell_\infty$ robust accuracy from $60. 2 \% \rightarrow 74. 6\%$. Further, on a novel specification -- distributionally robust OOD detection -- we improve on the certified AUC from $5\% \rightarrow 23\%$.

PDF Details

ICLR Conference 2020 Conference Paper

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

Tsui-Wei Weng
Krishnamurthy Dvijotham
Jonathan Uesato
Kai Xiao
Sven Gowal
Robert Stanforth
Pushmeet Kohli

Deep reinforcement learning has achieved great success in many previously difficult reinforcement learning tasks, yet recent studies show that deep RL agents are also unavoidably susceptible to adversarial perturbations, similar to deep neural networks in classification tasks. Prior works mostly focus on model-free adversarial attacks and agents with discrete actions. In this work, we study the problem of continuous control agents in deep RL with adversarial attacks and propose the first two-step algorithm based on learned model dynamics. Extensive experiments on various MuJoCo domains (Cartpole, Fish, Walker, Humanoid) demonstrate that our proposed framework is much more effective and efficient than model-free based attacks baselines in degrading agent performance as well as driving agents to unsafe states.

Details

ICLR Conference 2020 Conference Paper

Towards Stable and Efficient Training of Verifiably Robust Neural Networks

Huan Zhang 0001
Hongge Chen
Chaowei Xiao
Sven Gowal
Robert Stanforth
Bo Li 0026
Duane S. Boning
Cho-Jui Hsieh

Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hundreds depending on the underlying network architectures. Meanwhile, interval bound propagation (IBP) based training is efficient and significantly outperforms linear relaxation based methods on many tasks, yet it may suffer from stability issues since the bounds are much looser especially at the beginning of training. In this paper, we propose a new certified adversarial training method, CROWN-IBP, by combining the fast IBP bounds in a forward bounding pass and a tight linear relaxation based bound, CROWN, in a backward bounding pass. CROWN-IBP is computationally efficient and consistently outperforms IBP baselines on training verifiably robust neural networks. We conduct large scale experiments on MNIST and CIFAR datasets, and outperform all previous linear relaxation and bound propagation based certified defenses in L_inf robustness. Notably, we achieve 7.02% verified test error on MNIST at epsilon=0.3, and 66.94% on CIFAR-10 with epsilon=8/255.

Details

ICLR Conference 2020 Conference Paper

Towards Verified Robustness under Text Deletion Interventions

Johannes Welbl
Po-Sen Huang
Robert Stanforth
Sven Gowal
Krishnamurthy Dvijotham
Martin Szummer
Pushmeet Kohli

Neural networks are widely used in Natural Language Processing, yet despite their empirical successes, their behaviour is brittle: they are both over-sensitive to small input changes, and under-sensitive to deletions of large fractions of input text. This paper aims to tackle under-sensitivity in the context of natural language inference by ensuring that models do not become more confident in their predictions as arbitrary subsets of words from the input text are deleted. We develop a novel technique for formal verification of this specification for models based on the popular decomposable attention mechanism by employing the efficient yet effective interval bound propagation (IBP) approach. Using this method we can efficiently prove, given a model, whether a particular sample is free from the under-sensitivity problem. We compare different training methods to address under-sensitivity, and compare metrics to measure it. In our experiments on the SNLI and MNLI datasets, we observe that IBP training leads to a significantly improved verified accuracy. On the SNLI test set, we can verify 18.4% of samples, a substantial improvement over only 2.8% using standard training.

Details

IJCAI Conference 2019 Conference Paper

A Dual Approach to Verify and Train Deep Networks

Sven Gowal
Krishnamurthy Dvijotham
Robert Stanforth
Timothy Mann
Pushmeet Kohli

This paper addressed the problem of formally verifying desirable properties of neural networks, i. e. , obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (e. g. , robustness to bounded norm adversarial perturbations). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime, i. e. , it can be stopped at any time and a valid bound on the maximum violation can be obtained. Finally, we highlight how this approach can be used to train models that are amenable to verification.

PDF Details

NeurIPS Conference 2019 Conference Paper

Adversarial Robustness through Local Linearization

Chongli Qin
James Martens
Sven Gowal
Dilip Krishnan
Krishnamurthy Dvijotham
Alhussein Fawzi
Soham De
Robert Stanforth

Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust against weak attacks but break down under attacks that are stronger. This is often attributed to the phenomenon of gradient obfuscation; such models have a highly non-linear loss surface in the vicinity of training examples, making it hard for gradient-based attacks to succeed even though adversarial examples still exist. In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness. We show via extensive experiments on CIFAR-10 and ImageNet, that models trained with our regularizer avoid gradient obfuscation and can be trained significantly faster than adversarial training. Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with L-infinity norm adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack. Additionally, we match state of the art results for CIFAR-10 at 8/255.

PDF Details

NeurIPS Conference 2019 Conference Paper

Are Labels Required for Improving Adversarial Robustness?

Jean-Baptiste Alayrac
Jonathan Uesato
Po-Sen Huang
Alhussein Fawzi
Robert Stanforth
Pushmeet Kohli

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR- 10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21. 7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples. Finally, we report an improvement of 4% over the previous state-of-the- art on CIFAR-10 against the strongest known attack by using additional unlabeled data from the uncurated 80 Million Tiny Images dataset. This demonstrates that our finding extends as well to the more realistic case where unlabeled data is also uncurated, therefore opening a new avenue for improving adversarial training.

PDF Details

UAI Conference 2019 Conference Paper

Efficient Neural Network Verification with Exactness Characterization

Krishnamurthy Dvijotham
Robert Stanforth
Sven Gowal
Chongli Qin
Soham De
Pushmeet Kohli

Remarkable progress has been made on verification of neural networks, i. e. , showing that neural networks are provably consistent with specifications encoding properties like adversarial robustness. Recent methods developed for scalable neural network verification are based on computing an upper bound on the worst-case violation of the specification. Semidefinite programming (SDP) has been proposed as a means to obtain tight upper bounds. However, SDP solvers do not scale to large neural networks. We introduce a Lagrangian relaxation based on the SDP formulation and a novel algorithm to solve the relaxation that scales to networks that are two orders of magnitude larger than the off-the-shelf SDP solvers. Although verification of neural networks is known to be NP-hard in general, we develop the first known sufficient conditions under which a polynomial time verification algorithm (based on the above relaxation) is guaranteed to perform exact verification (i. e. , either verify a property or establish it is untrue). The algorithm can be implemented using primitives available readily in common deep learning frameworks. Experiments show that the algorithm is fast, and is able to compute tight upper bounds on the error rates under adversarial attacks of convolutional networks trained on MNIST and CIFAR-10.

Details

UAI Conference 2018 Conference Paper

A Dual Approach to Scalable Verification of Deep Networks

Krishnamurthy Dvijotham
Robert Stanforth
Sven Gowal
Timothy A. Mann
Pushmeet Kohli

This paper addresses the problem of formally verifying desirable properties of neural networks, i. e. , obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i. e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.

Details