Author name cluster

Siddharth Swaroop

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

ICLR Conference 2025 Conference Paper

Connecting Federated ADMM to Bayes

Siddharth Swaroop
Mohammad Emtiyaz Khan
Finale Doshi

We provide new connections between two distinct federated learning approaches based on (i) ADMM and (ii) Variational Bayes (VB), and propose new variants by combining their complementary strengths. Specifically, we show that the dual variables in ADMM naturally emerge through the "site" parameters used in VB with isotropic Gaussian covariances. Using this, we derive two versions of ADMM from VB that use flexible covariances and functional regularisation, respectively. Through numerical experiments, we validate the improvements obtained in performance. The work shows connection between two fields that are believed to be fundamentally different and combines them to improve federated learning.

Details

RLC Conference 2025 Conference Paper

When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions

Ian M. Moore
Eura Nofshin
Siddharth Swaroop
Susan Murphy
Finale Doshi-Velez
Weiwei Pan

In settings where an AI agent nudges a human agent toward a goal, the AI can quickly learn a high-quality policy by modeling the human well. Despite behavioral evidence that humans hyperbolically discount future rewards, we model human as Markov Decision Processes (MDPs) with exponential discounting. This is because planning is difficult with non-exponential discounts. In this work, we investigate whether the performance benefits of modeling humans as hyperbolic discounters outweigh the computational costs. We focus on AI interventions that change the human's discounting (i. e. decreases the human's ""nearsightedness"" to help them toward distant goals). We derive a fixed exponential discount factor that can approximate hyperbolic discounting, and prove that this approximation guarantees the AI will never miss a necessary intervention. We also prove that our approximation causes fewer false positives (unnecessary interventions) than the mean hazard rate, another well-known method for approximating hyperbolic MDPs as exponential ones. Surprisingly, our experiments demonstrate that exponential approximations outperform hyperbolic ones in online learning, even when the ground-truth human MDP is hyperbolically discounted.

PDF Details

RLJ Journal 2025 Journal Article

When and Why Hyperbolic Discounting Matters for Reinforcement Learning Interventions

Ian M. Moore
Eura Nofshin
Siddharth Swaroop
Susan Murphy
Finale Doshi-Velez
Weiwei Pan

In settings where an AI agent nudges a human agent toward a goal, the AI can quickly learn a high-quality policy by modeling the human well. Despite behavioral evidence that humans hyperbolically discount future rewards, we model human as Markov Decision Processes (MDPs) with exponential discounting. This is because planning is difficult with non-exponential discounts. In this work, we investigate whether the performance benefits of modeling humans as hyperbolic discounters outweigh the computational costs. We focus on AI interventions that change the human's discounting (i.e. decreases the human's ""nearsightedness"" to help them toward distant goals). We derive a fixed exponential discount factor that can approximate hyperbolic discounting, and prove that this approximation guarantees the AI will never miss a necessary intervention. We also prove that our approximation causes fewer false positives (unnecessary interventions) than the mean hazard rate, another well-known method for approximating hyperbolic MDPs as exponential ones. Surprisingly, our experiments demonstrate that exponential approximations outperform hyperbolic ones in online learning, even when the ground-truth human MDP is hyperbolically discounted.

PDF Details

AAMAS Conference 2024 Conference Paper

Reinforcement Learning Interventions on Boundedly Rational Human Agents in Frictionful Tasks

Eura Nofshin
Siddharth Swaroop
Weiwei Pan
Susan Murphy
Finale Doshi-Velez

Many important behavior changes are frictionful; they require individuals to expend effort over a long period with little immediate gratification. Here, an artificial intelligence (AI) agent can provide personalized interventions to help individuals stick to their goals. In these settings, the AI agent must personalize rapidly (before the individual disengages) and interpretably, to help us understand the behavioral interventions. In this paper, we introduce Behavior Model Reinforcement Learning (BMRL), a framework in which an AI agent intervenes on the parameters of a Markov Decision Process (MDP) belonging to a boundedly rational human agent. Our formulation of the human decision-maker as a planning agent allows us to attribute undesirable human policies (ones that do not lead to the goal) to their maladapted MDP parameters, such as an extremely low discount factor. Furthermore, we propose a class of tractable human models that captures fundamental behaviors in frictionful tasks. Introducing a notion of MDP equivalence specific to BMRL, we theoretically and empirically show that AI planning with our human models can lead to helpful policies on a wide range of more complex, ground-truth humans.

PDF

JMLR Journal 2024 Journal Article

Rethinking Discount Regularization: New Interpretations, Unintended Consequences, and Solutions for Regularization in Reinforcement Learning

Sarah Rathnam
Sonali Parbhoo
Siddharth Swaroop
Weiwei Pan
Susan A. Murphy
Finale Doshi-Velez

Discount regularization, using a shorter planning horizon when calculating the optimal policy, is a popular choice to avoid overfitting when faced with sparse or noisy data. It is commonly interpreted as de-emphasizing or ignoring delayed effects. In this paper, we prove two alternative views of discount regularization that expose unintended consequences and motivate novel regularization methods. In model-based RL, planning under a lower discount factor acts like a prior with stronger regularization on state-action pairs with more transition data. This leads to poor performance when the transition matrix is estimated from data sets with uneven amounts of data across state-action pairs. In model-free RL, discount regularization equates to planning using a weighted average Bellman update, where the agent plans as if the values of all state-action pairs are closer than implied by the data. Our equivalence theorems motivate simple methods that generalize discount regularization by setting parameters locally for individual state-action pairs rather than globally. We demonstrate the failures of discount regularization and how we remedy them using our state-action-specific methods across empirical examples with both tabular and continuous state spaces. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

PDF Details

TMLR Journal 2023 Journal Article

Differentially private partitioned variational inference

Mikko A. Heikkilä
Matthew Ashman
Siddharth Swaroop
Richard E Turner
Antti Honkela

Learning a privacy-preserving model from sensitive data which are distributed across multiple devices is an increasingly important problem. The problem is often formulated in the federated learning context, with the aim of learning a single global model while keeping the data distributed. Moreover, Bayesian learning is a popular approach for modelling, since it naturally supports reliable uncertainty estimates. However, Bayesian learning is generally intractable even with centralised non-private data and so approximation techniques such as variational inference are a necessity. Variational inference has recently been extended to the non-private federated learning setting via the partitioned variational inference algorithm. For privacy protection, the current gold standard is called differential privacy. Differential privacy guarantees privacy in a strong, mathematically clearly defined sense. In this paper, we present differentially private partitioned variational inference, the first general framework for learning a variational approximation to a Bayesian posterior distribution in the federated learning setting while minimising the number of communication rounds and providing differential privacy guarantees for data subjects. We propose three alternative implementations in the general framework, one based on perturbing local optimisation runs done by individual parties, and two based on perturbing updates to the global model (one using a version of federated averaging, the second one adding virtual parties to the protocol), and compare their properties both theoretically and empirically. We show that perturbing the local optimisation works well with simple and complex models as long as each party has enough local data. However, the privacy is always guaranteed independently by each party. In contrast, perturbing the global updates works best with relatively simple models. Given access to suitable secure primitives, such as secure aggregation or secure shuffling, the performance can be improved by all parties guaranteeing privacy jointly.

PDF Details

TMLR Journal 2023 Journal Article

Improving Continual Learning by Accurate Gradient Reconstructions of the Past

Erik Daxberger
Siddharth Swaroop
Kazuki Osawa
Rio Yokota
Richard E Turner
José Miguel Hernández-Lobato
Mohammad Emtiyaz Khan

Weight-regularization and experience replay are two popular continual-learning strategies with complementary strengths: while weight-regularization requires less memory, replay can more accurately mimic batch training. How can we combine them to get better methods? Despite the simplicity of the question, little is known or done to optimally combine these approaches. In this paper, we present such a method by using a recently proposed principle of adaptation that relies on a faithful reconstruction of the gradients of the past data. Using this principle, we design a prior which combines two types of replay methods with a quadratic weight-regularizer and achieves better gradient reconstructions. The combination improves performance on standard task-incremental continual learning benchmarks such as Split-CIFAR, SplitTinyImageNet, and ImageNet-1000, achieving $>\!80\%$ of the batch performance by simply utilizing a memory of $<\!10\%$ of the past data. Our work shows that a good combination of the two strategies can be very effective in reducing forgetting.

PDF Details

NeurIPS Conference 2021 Conference Paper

Collapsed Variational Bounds for Bayesian Neural Networks

Marcin Tomczak
Siddharth Swaroop
Andrew Foong
Richard Turner

Recent interest in learning large variational Bayesian Neural Networks (BNNs) has been partly hampered by poor predictive performance caused by underfitting, and their performance is known to be very sensitive to the prior over weights. Current practice often fixes the prior parameters to standard values or tunes them using heuristics or cross-validation. In this paper, we treat prior parameters in a distributional way by extending the model and collapsing the variational bound with respect to their posteriors. This leads to novel and tighter Evidence Lower Bounds (ELBOs) for performing variational inference (VI) in BNNs. Our experiments show that the new bounds significantly improve the performance of Gaussian mean-field VI applied to BNNs on a variety of data sets, demonstrating that mean-field VI works well even in deep models. We also find that the tighter ELBOs can be good optimization targets for learning the hyperparameters of hierarchical priors.

PDF Details

ICLR Conference 2021 Conference Paper

Generalized Variational Continual Learning

Noel Loo
Siddharth Swaroop
Richard E. Turner

Continual learning deals with training models on new tasks and datasets in an online fashion. One strand of research has used probabilistic regularization for continual learning, with two of the main approaches in this vein being Online Elastic Weight Consolidation (Online EWC) and Variational Continual Learning (VCL). VCL employs variational inference, which in other settings has been improved empirically by applying likelihood-tempering. We show that applying this modification to VCL recovers Online EWC as a limiting case, allowing for interpolation between the two approaches. We term the general algorithm Generalized VCL (GVCL). In order to mitigate the observed overpruning effect of VI, we take inspiration from a common multi-task architecture, neural networks with task-specific FiLM layers, and find that this addition leads to significant performance gains, specifically for variational methods. In the small-data regime, GVCL strongly outperforms existing baselines. In larger datasets, GVCL with FiLM layers outperforms or is competitive with existing baselines in terms of accuracy, whilst also providing significantly better calibration.

Details

NeurIPS Conference 2021 Conference Paper

Knowledge-Adaptation Priors

Mohammad Emtiyaz Khan
Siddharth Swaroop

Humans and animals have a natural ability to quickly adapt to their surroundings, but machine-learning models, when subjected to changes, often require a complete retraining from scratch. We present Knowledge-adaptation priors (K-priors) to reduce the cost of retraining by enabling quick and accurate adaptation for a wide-variety of tasks and models. This is made possible by a combination of weight and function-space priors to reconstruct the gradients of the past, which recovers and generalizes many existing, but seemingly-unrelated, adaptation strategies. Training with simple first-order gradient methods can often recover the exact retrained model to an arbitrary accuracy by choosing a sufficiently large memory of the past data. Empirical results show that adaptation with K-priors achieves performance similar to full retraining, but only requires training on a handful of past examples.

PDF Details

NeurIPS Conference 2020 Conference Paper

Continual Deep Learning by Functional Regularisation of Memorable Past

Pingbo Pan
Siddharth Swaroop
Alexander Immer
Runa Eschenhagen
Richard Turner
Mohammad Emtiyaz Khan

Continually learning new skills is important for intelligent systems, yet standard deep learning methods suffer from catastrophic forgetting of the past. Recent works address this with weight regularisation. Functional regularisation, although computationally expensive, is expected to perform better, but rarely does so in practice. In this paper, we fix this issue by using a new functional-regularisation approach that utilises a few memorable past examples crucial to avoid forgetting. By using a Gaussian Process formulation of deep networks, our approach enables training in weight-space while identifying both the memorable past and a functional prior. Our method achieves state-of-the-art performance on standard benchmarks and opens a new direction for life-long learning where regularisation and memory-based methods are naturally combined.

PDF Details

NeurIPS Conference 2020 Conference Paper

Efficient Low Rank Gaussian Variational Inference for Neural Networks

Marcin Tomczak
Siddharth Swaroop
Richard Turner

Bayesian neural networks are enjoying a renaissance driven in part by recent advances in variational inference (VI). The most common form of VI employs a fully factorized or mean-field distribution, but this is known to suffer from several pathologies, especially as we expect posterior distributions with highly correlated parameters. Current algorithms that capture these correlations with a Gaussian approximating family are difficult to scale to large models due to computational costs and high variance of gradient updates. By using a new form of the reparametrization trick, we derive a computationally efficient algorithm for performing VI with a Gaussian family with a low-rank plus diagonal covariance structure. We scale to deep feed-forward and convolutional architectures. We find that adding low-rank terms to parametrized diagonal covariance does not improve predictive performance except on small networks, but low-rank terms added to a constant diagonal covariance improves performance on small and large-scale network architectures.

PDF Details

NeurIPS Conference 2019 Conference Paper

Practical Deep Learning with Bayesian Principles

Kazuki Osawa
Siddharth Swaroop
Mohammad Emtiyaz Khan
Anirudh Jain
Runa Eschenhagen
Richard Turner
Rio Yokota

Bayesian methods promise to fix many shortcomings of deep learning, but they are impractical and rarely match the performance of standard methods, let alone improve them. In this paper, we demonstrate practical training of deep networks with natural-gradient variational inference. By applying techniques such as batch normalisation, data augmentation, and distributed training, we achieve similar performance in about the same number of epochs as the Adam optimiser, even on large datasets such as ImageNet. Importantly, the benefits of Bayesian principles are preserved: predictive probabilities are well-calibrated, uncertainties on out-of-distribution data are improved, and continual-learning performance is boosted. This work enables practical deep learning while preserving benefits of Bayesian principles. A PyTorch implementation is available as a plug-and-play optimiser.

PDF Details