Arrow Research search

Author name cluster

Robert Stanforth

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

TMLR Journal 2025 Journal Article

Privacy Awareness for Information-Sharing Assistants: A Case-study on Form-filling with Contextual Integrity

  • Sahra Ghalebikesabi
  • Eugene Bagdasarian
  • Ren Yi
  • Itay Yona
  • Ilia Shumailov
  • Aneesh Pappu
  • Chongyang Shi
  • Laura Weidinger

Advanced AI assistants combine frontier LLMs and tool access to autonomously perform complex tasks on behalf of users. While the helpfulness of such assistants can increase dramatically with access to user information including emails and documents, this raises privacy concerns about assistants sharing inappropriate information with third parties without user supervision. To steer information-sharing assistants to behave in accordance with privacy expectations, we propose to operationalize the design of privacy-conscious assistants that conform with *contextual integrity* (CI), a framework that equates privacy with the appropriate flow of information in a given context. In particular, we design and evaluate a number of strategies to steer assistants' information-sharing actions to be CI compliant. Our evaluation is based on a novel form filling benchmark composed of human annotations of common webform applications, and it reveals that prompting frontier LLMs to perform CI-based reasoning yields strong results.

ICLR Conference 2024 Conference Paper

Expressive Losses for Verified Robustness via Convex Combinations

  • Alessandro De Palma
  • Rudy Bunel
  • Krishnamurthy Dvijotham
  • M. Pawan Kumar
  • Robert Stanforth
  • Alessio Lomuscio

In order to train networks for verified adversarial robustness, it is common to over-approximate the worst-case loss over perturbation regions, resulting in networks that attain verifiability at the expense of standard performance. As shown in recent work, better trade-offs between accuracy and robustness can be obtained by carefully coupling adversarial training with over-approximations. We hypothesize that the expressivity of a loss function, which we formalize as the ability to span a range of trade-offs between lower and upper bounds to the worst-case loss through a single parameter (the over-approximation coefficient), is key to attaining state-of-the-art performance. To support our hypothesis, we show that trivial expressive losses, obtained via convex combinations between adversarial attacks and IBP bounds, yield state-of-the-art results across a variety of settings in spite of their conceptual simplicity. We provide a detailed analysis of the relationship between the over-approximation coefficient and performance profiles across different expressive losses, showing that, while expressivity is essential, better approximations of the worst-case loss are not necessarily linked to superior robustness-accuracy trade-offs.

NeurIPS Conference 2021 Conference Paper

Make Sure You're Unsure: A Framework for Verifying Probabilistic Specifications

  • Leonard Berrada
  • Sumanth Dathathri
  • Krishnamurthy Dvijotham
  • Robert Stanforth
  • Rudy R. Bunel
  • Jonathan Uesato
  • Sven Gowal
  • M. Pawan Kumar

Most real world applications require dealing with stochasticity like sensor noise or predictive uncertainty, where formal specifications of desired behavior are inherently probabilistic. Despite the promise of formal verification in ensuring the reliability of neural networks, progress in the direction of probabilistic specifications has been limited. In this direction, we first introduce a general formulation of probabilistic specifications for neural networks, which captures both probabilistic networks (e. g. , Bayesian neural networks, MC-Dropout networks) and uncertain inputs (distributions over inputs arising from sensor noise or other perturbations). We then propose a general technique to verify such specifications by generalizing the notion of Lagrangian duality, replacing standard Lagrangian multipliers with "functional multipliers" that can be arbitrary functions of the activations at a given layer. We show that an optimal choice of functional multipliers leads to exact verification (i. e. , sound and complete verification), and for specific forms of multipliers, we develop tractable practical verification algorithms. We empirically validate our algorithms by applying them to Bayesian Neural Networks (BNNs) and MC Dropout Networks, and certifying properties such as adversarial robustness and robust detection of out-of-distribution (OOD) data. On these tasks we are able to provide significantly stronger guarantees when compared to prior work -- for instance, for a VGG-64 MC-Dropout CNN trained on CIFAR-10 in a verification-agnostic manner, we improve the certified AUC (a verified lower bound on the true AUC) for robust OOD detection (on CIFAR-100) from $0 \% \rightarrow 29\%$. Similarly, for a BNN trained on MNIST, we improve on the $\ell_\infty$ robust accuracy from $60. 2 \% \rightarrow 74. 6\%$. Further, on a novel specification -- distributionally robust OOD detection -- we improve on the certified AUC from $5\% \rightarrow 23\%$.

ICLR Conference 2020 Conference Paper

Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control

  • Tsui-Wei Weng
  • Krishnamurthy Dvijotham
  • Jonathan Uesato
  • Kai Xiao
  • Sven Gowal
  • Robert Stanforth
  • Pushmeet Kohli

Deep reinforcement learning has achieved great success in many previously difficult reinforcement learning tasks, yet recent studies show that deep RL agents are also unavoidably susceptible to adversarial perturbations, similar to deep neural networks in classification tasks. Prior works mostly focus on model-free adversarial attacks and agents with discrete actions. In this work, we study the problem of continuous control agents in deep RL with adversarial attacks and propose the first two-step algorithm based on learned model dynamics. Extensive experiments on various MuJoCo domains (Cartpole, Fish, Walker, Humanoid) demonstrate that our proposed framework is much more effective and efficient than model-free based attacks baselines in degrading agent performance as well as driving agents to unsafe states.

ICLR Conference 2020 Conference Paper

Towards Stable and Efficient Training of Verifiably Robust Neural Networks

  • Huan Zhang 0001
  • Hongge Chen
  • Chaowei Xiao
  • Sven Gowal
  • Robert Stanforth
  • Bo Li 0026
  • Duane S. Boning
  • Cho-Jui Hsieh

Training neural networks with verifiable robustness guarantees is challenging. Several existing approaches utilize linear relaxation based neural network output bounds under perturbation, but they can slow down training by a factor of hundreds depending on the underlying network architectures. Meanwhile, interval bound propagation (IBP) based training is efficient and significantly outperforms linear relaxation based methods on many tasks, yet it may suffer from stability issues since the bounds are much looser especially at the beginning of training. In this paper, we propose a new certified adversarial training method, CROWN-IBP, by combining the fast IBP bounds in a forward bounding pass and a tight linear relaxation based bound, CROWN, in a backward bounding pass. CROWN-IBP is computationally efficient and consistently outperforms IBP baselines on training verifiably robust neural networks. We conduct large scale experiments on MNIST and CIFAR datasets, and outperform all previous linear relaxation and bound propagation based certified defenses in L_inf robustness. Notably, we achieve 7.02% verified test error on MNIST at epsilon=0.3, and 66.94% on CIFAR-10 with epsilon=8/255.

ICLR Conference 2020 Conference Paper

Towards Verified Robustness under Text Deletion Interventions

  • Johannes Welbl
  • Po-Sen Huang
  • Robert Stanforth
  • Sven Gowal
  • Krishnamurthy Dvijotham
  • Martin Szummer
  • Pushmeet Kohli

Neural networks are widely used in Natural Language Processing, yet despite their empirical successes, their behaviour is brittle: they are both over-sensitive to small input changes, and under-sensitive to deletions of large fractions of input text. This paper aims to tackle under-sensitivity in the context of natural language inference by ensuring that models do not become more confident in their predictions as arbitrary subsets of words from the input text are deleted. We develop a novel technique for formal verification of this specification for models based on the popular decomposable attention mechanism by employing the efficient yet effective interval bound propagation (IBP) approach. Using this method we can efficiently prove, given a model, whether a particular sample is free from the under-sensitivity problem. We compare different training methods to address under-sensitivity, and compare metrics to measure it. In our experiments on the SNLI and MNLI datasets, we observe that IBP training leads to a significantly improved verified accuracy. On the SNLI test set, we can verify 18.4% of samples, a substantial improvement over only 2.8% using standard training.

IJCAI Conference 2019 Conference Paper

A Dual Approach to Verify and Train Deep Networks

  • Sven Gowal
  • Krishnamurthy Dvijotham
  • Robert Stanforth
  • Timothy Mann
  • Pushmeet Kohli

This paper addressed the problem of formally verifying desirable properties of neural networks, i. e. , obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (e. g. , robustness to bounded norm adversarial perturbations). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime, i. e. , it can be stopped at any time and a valid bound on the maximum violation can be obtained. Finally, we highlight how this approach can be used to train models that are amenable to verification.

NeurIPS Conference 2019 Conference Paper

Adversarial Robustness through Local Linearization

  • Chongli Qin
  • James Martens
  • Sven Gowal
  • Dilip Krishnan
  • Krishnamurthy Dvijotham
  • Alhussein Fawzi
  • Soham De
  • Robert Stanforth

Adversarial training is an effective methodology for training deep neural networks that are robust against adversarial, norm-bounded perturbations. However, the computational cost of adversarial training grows prohibitively as the size of the model and number of input dimensions increase. Further, training against less expensive and therefore weaker adversaries produces models that are robust against weak attacks but break down under attacks that are stronger. This is often attributed to the phenomenon of gradient obfuscation; such models have a highly non-linear loss surface in the vicinity of training examples, making it hard for gradient-based attacks to succeed even though adversarial examples still exist. In this work, we introduce a novel regularizer that encourages the loss to behave linearly in the vicinity of the training data, thereby penalizing gradient obfuscation while encouraging robustness. We show via extensive experiments on CIFAR-10 and ImageNet, that models trained with our regularizer avoid gradient obfuscation and can be trained significantly faster than adversarial training. Using this regularizer, we exceed current state of the art and achieve 47% adversarial accuracy for ImageNet with L-infinity norm adversarial perturbations of radius 4/255 under an untargeted, strong, white-box attack. Additionally, we match state of the art results for CIFAR-10 at 8/255.

NeurIPS Conference 2019 Conference Paper

Are Labels Required for Improving Adversarial Robustness?

  • Jean-Baptiste Alayrac
  • Jonathan Uesato
  • Po-Sen Huang
  • Alhussein Fawzi
  • Robert Stanforth
  • Pushmeet Kohli

Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR- 10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21. 7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples. Finally, we report an improvement of 4% over the previous state-of-the- art on CIFAR-10 against the strongest known attack by using additional unlabeled data from the uncurated 80 Million Tiny Images dataset. This demonstrates that our finding extends as well to the more realistic case where unlabeled data is also uncurated, therefore opening a new avenue for improving adversarial training.

UAI Conference 2019 Conference Paper

Efficient Neural Network Verification with Exactness Characterization

  • Krishnamurthy Dvijotham
  • Robert Stanforth
  • Sven Gowal
  • Chongli Qin
  • Soham De
  • Pushmeet Kohli

Remarkable progress has been made on verification of neural networks, i. e. , showing that neural networks are provably consistent with specifications encoding properties like adversarial robustness. Recent methods developed for scalable neural network verification are based on computing an upper bound on the worst-case violation of the specification. Semidefinite programming (SDP) has been proposed as a means to obtain tight upper bounds. However, SDP solvers do not scale to large neural networks. We introduce a Lagrangian relaxation based on the SDP formulation and a novel algorithm to solve the relaxation that scales to networks that are two orders of magnitude larger than the off-the-shelf SDP solvers. Although verification of neural networks is known to be NP-hard in general, we develop the first known sufficient conditions under which a polynomial time verification algorithm (based on the above relaxation) is guaranteed to perform exact verification (i. e. , either verify a property or establish it is untrue). The algorithm can be implemented using primitives available readily in common deep learning frameworks. Experiments show that the algorithm is fast, and is able to compute tight upper bounds on the error rates under adversarial attacks of convolutional networks trained on MNIST and CIFAR-10.

UAI Conference 2018 Conference Paper

A Dual Approach to Scalable Verification of Deep Networks

  • Krishnamurthy Dvijotham
  • Robert Stanforth
  • Sven Gowal
  • Timothy A. Mann
  • Pushmeet Kohli

This paper addresses the problem of formally verifying desirable properties of neural networks, i. e. , obtaining provable guarantees that neural networks satisfy specifications relating their inputs and outputs (robustness to bounded norm adversarial perturbations, for example). Most previous work on this topic was limited in its applicability by the size of the network, network architecture and the complexity of properties to be verified. In contrast, our framework applies to a general class of activation functions and specifications on neural network inputs and outputs. We formulate verification as an optimization problem (seeking to find the largest violation of the specification) and solve a Lagrangian relaxation of the optimization problem to obtain an upper bound on the worst case violation of the specification being verified. Our approach is anytime i. e. it can be stopped at any time and a valid bound on the maximum violation can be obtained. We develop specialized verification algorithms with provable tightness guarantees under special assumptions and demonstrate the practical significance of our general verification approach on a variety of verification tasks.