Arrow Research search

Author name cluster

Sunil Gupta

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

32 papers
2 author rows

Possible papers

32

AAAI Conference 2026 Conference Paper

Probabilities Are All You Need: A Probability-Only Approach to Uncertainty Estimation in Large Language Models

  • Manh Nguyen
  • Sunil Gupta
  • Hung Le

Large Language Models (LLMs) exhibit strong performance across various natural language processing (NLP) tasks but remain vulnerable to hallucinations, generating factually incorrect or misleading outputs. Uncertainty estimation, often using predictive entropy estimation, is key to addressing this issue. However, existing methods often require multiple samples or extra computation to assess semantic entropy. This paper proposes an efficient, training-free uncertainty estimation method that approximates predictive entropy using the responses' top-K probabilities. Moreover, we employ an adaptive mechanism to determine K to enhance flexibility and filter out low-confidence probabilities. Experimental results on three free-form question-answering datasets across several LLMs demonstrate that our method outperforms expensive state-of-the-art baselines, contributing to the broader goal of enhancing LLM trustworthiness.

IJCAI Conference 2025 Conference Paper

Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer

  • Minh Hoang Nguyen
  • Linh Le Pham Van
  • Thommen George Karimpanal
  • Sunil Gupta
  • Hung Le

Decision Transformers (DT) play a crucial role in modern reinforcement learning, leveraging offline datasets to achieve impressive results across various domains. However, DT requires high-quality, comprehensive data to perform optimally. In real-world applications, the lack of training data and the scarcity of optimal behaviours make training on offline datasets challenging, as suboptimal data can hinder performance. To address this, we propose the Counterfactual Reasoning Decision Transformer (CRDT), a novel framework inspired by counterfactual reasoning. CRDT enhances DT’s ability to reason beyond known data by generating and utilizing counterfactual experiences, enabling improved decision-making in unseen scenarios. Experiments across Atari and D4RL benchmarks, including scenarios with limited data and altered dynamics, demonstrate that CRDT outperforms conventional DT approaches. Additionally, reasoning counterfactually allows the DT agent to obtain stitching abilities, combining suboptimal trajectories, without architectural modifications. These results highlight the potential of counterfactual reasoning to enhance reinforcement learning agents' performance and generalization capabilities.

AAMAS Conference 2025 Conference Paper

Navigating Social Dilemmas with LLM-based Agents via Consideration of Future Consequences

  • Dung Nguyen
  • Hung Le
  • Kien Do
  • Sunil Gupta
  • Svetha Venkatesh
  • Truyen Tran

Agents built on LLMs have shown versatile capabilities but face difficulties in being cooperative in social dilemma situations. When making decisions under the strain of selecting between long-term consequences and short-term benefits in commonly shared resources, LLM-based agents are vulnerable to the tragedy of the commons, i. e. individuals’ greed exploitation leads to early depletion. We propose LLM agents that consider future consequences to aid them in navigating intertemporal social dilemmas. We introduce two approaches—prompting and intervention—to equip the agent with the ability to consider future consequences when making a decision, which results in a new kind of agent—CFC-Agent. Furthermore, we enable the CFC-Agent to act toward different levels of consideration for future consequences. Our experiments in different settings show that agents that consider future consequences exhibit sustainable behaviour and achieve high common rewards for the population.

IJCAI Conference 2025 Conference Paper

Navigating Social Dilemmas with LLM-based Agents via Consideration of Future Consequences

  • Dung Nguyen
  • Hung Le
  • Kien Do
  • Sunil Gupta
  • Svetha Venkatesh
  • Truyen Tran

Artificial agents with the aid of large language models (LLMs) are effective in various real-world scenarios but struggle to cooperate in social dilemmas. When making decisions under the strain of selecting between long-term consequences and short-term benefits in commonly shared resources, LLM-based agents often exploit the environment, leading to early depletion. Inspired by the concept of consideration of future consequences (CFC), which is well-known in social psychology, we propose a framework to enable the ability to consider future consequences for LLM-based agents, which results in a new kind of agent that we term the CFC-Agent. We enable the CFC-Agent to act toward different levels of consideration for future consequences. Our first set of experiments, where LLM is directly asked to make decisions, shows that agents considering future consequences exhibit sustainable behaviour and achieve high common rewards for the population. Extensive experiments in complex environments showed that the CFC-Agent can manage a sequence of calls to LLM for reasoning and engaging in communication to cooperate with others to resolve the common dilemma better. Finally, our analysis showed that considering future consequences not only affects the final decision but also improves the conversations between LLM-based agents toward a better resolution of social dilemmas.

NeurIPS Conference 2025 Conference Paper

Reproducing Kernel Banach Space Models for Neural Networks with Application to Rademacher Complexity Analysis

  • Alistair Shilton
  • Sunil Gupta
  • Santu Rana
  • Svetha Venkatesh

This paper explores the use of Hermite transform based reproducing kernel Banach space methods to construct exact or un-approximated models of feedforward neural networks of arbitrary width, depth and topology, including ResNet and Transformers networks, assuming only a feedforward topology, finite energy activations and finite (spectral-) norm weights and biases. Using this model, two straightforward but surprisingly tight bounds on Rademacher complexity are derived, precisely (1) a general bound that is width-independent and scales exponentially with depth; and (2) a width- and depth-independent bound for networks with appropriately constrained (below threshold) weights and biases.

NeurIPS Conference 2024 Conference Paper

Active Set Ordering

  • Quoc Phong Nguyen
  • Sunil Gupta
  • Svetha Venkatesh
  • Bryan Kian Hsiang Low
  • Patrick Jaillet

In this paper, we formalize the active set ordering problem, which involves actively discovering a set of inputs based on their orderings determined by expensive evaluations of a blackbox function. We then propose the mean prediction (MP) algorithm and theoretically analyze it in terms of the regret of predicted pairwise orderings between inputs. Notably, as a special case of this framework, we can cast Bayesian optimization as an active set ordering problem by recognizing that maximizers can be identified solely by comparison rather than by precisely estimating the function evaluations. As a result, we are able to construct the popular Gaussian process upper confidence bound (GP-UCB) algorithm through the lens of ordering with several nuanced insights. We empirically validate the performance of our proposed solution using various synthetic functions and real-world datasets.

IJCAI Conference 2024 Conference Paper

Diversifying Training Pool Predictability for Zero-shot Coordination: A Theory of Mind Approach

  • Dung Nguyen
  • Hung Le
  • Kien Do
  • Sunil Gupta
  • Svetha Venkatesh
  • Truyen Tran

The challenge in constructing artificial social agents is to enable adaptation ability to novel agents, and is called zero-shot coordination (ZSC). A promising approach is to train the adaptive agents by interacting with a diverse pool of collaborators, assuming that the greater the diversity in other agents seen during training, the better the generalisation. In this paper, we explore an alternative procedure by considering the behavioural predictability of collaborators, i. e. whether their actions and intentions are predictable, and use it to select a diverse set of agents for the training pool. More specifically, we develop a pool of agents through self-play training during which agents' behaviour evolves and has diversity in levels of behavioural predictability (LoBP) through its evolution. We construct an observer to compute the level of behavioural predictability for each version of the collaborators. To do so, the observer is equipped with the theory of mind (ToM) capability to learn to infer the actions and intentions of others. We then use an episodic memory based on the LoBP metric to maintain agents with different levels of behavioural predictability in the pool of agents. Since behaviours that emerge at the later training phase are more complex and meaningful, the memory is updated with the latest versions of training agents. Our extensive experiments demonstrate that LoBP-based diversity training leads to better ZSC than other diversity training methods.

IJCAI Conference 2024 Conference Paper

EMOTE: An Explainable Architecture for Modelling the Other through Empathy

  • Manisha Senadeera
  • Thommen Karimpanal George
  • Stephan Jacobs
  • Sunil Gupta
  • Santu Rana

Empathy allows us to assume others are like us and have goals analogous to our own. This can also at times be applied to multi-agent games - e. g. Agent 1's attraction to green balls is analogous to Agent 2's attraction to red balls. Drawing inspiration from empathy, we propose EMOTE, a simple and explainable inverse reinforcement learning (IRL) approach designed to model another agent's action-value function and from it, infer a unique reward function. This is done by referencing the learning agent's own action value function, removing the need to maintain independent action-value estimates for the modelled agents whilst simultaneously addressing the ill-posed nature of IRL by inferring a unique reward function. We experiment on minigrid environments showing EMOTE: (a) produces more consistent reward estimates relative to other IRL baselines (b) is robust in scenarios with composite reward and action-value functions (c) produces human-interpretable states, helping to explain how the agent views other agents.

AAMAS Conference 2024 Conference Paper

Policy Learning for Off-Dynamics RL with Deficient Support

  • Linh Le Pham Van
  • Hung The Tran
  • Sunil Gupta

Reinforcement Learning (RL) can effectively learn complex policies. However, learning these policies often demands extensive trialand-error interactions with the environment. In many real-world scenarios, this approach is not practical due to the high costs of data collection and safety concerns. As a result, a common strategy is to transfer a policy trained in a low-cost, rapid source simulator to a real-world target environment. However, this process poses challenges. Simulators, no matter how advanced, cannot perfectly replicate the intricacies of the real world, leading to dynamics discrepancies between the source and target environments. Past research posited that the source domain must encompass all possible target transitions, a condition we term full support. However, expecting full support is often unrealistic, especially in scenarios where significant dynamics discrepancies arise. In this paper, our emphasis shifts to addressing large dynamics mismatch adaptation. We move away from the stringent full support condition of earlier research, focusing instead on crafting an effective policy for the target domain. Our proposed approach is simple but effective. It is anchored in the central concepts of the skewing and extension of source support towards target support to mitigate support deficiencies. Through comprehensive testing on a varied set of benchmarks, our method’s efficacy stands out, showcasing notable improvements over previous techniques.

AAAI Conference 2024 Conference Paper

Root Cause Explanation of Outliers under Noisy Mechanisms

  • Phuoc Nguyen
  • Truyen Tran
  • Sunil Gupta
  • Thin Nguyen
  • Svetha Venkatesh

Identifying root causes of anomalies in causal processes is vital across disciplines. Once identified, one can isolate the root causes and implement necessary measures to restore the normal operation. Causal processes are often modelled as graphs with entities being nodes and their paths/interconnections as edge. Existing work only consider the contribution of nodes in the generative process, thus can not attribute the outlier score to the edges of the mechanism if the anomaly occurs in the connections. In this paper, we consider both individual edge and node of each mechanism when identifying the root causes. We introduce a noisy functional causal model to account for this purpose. Then, we employ Bayesian learning and inference methods to infer the noises of the nodes and edges. We then represent the functional form of a target outlier leaf as a function of the node and edge noises. Finally, we propose an efficient gradient-based attribution method to compute the anomaly attribution scores which scales linearly with the number of nodes and edges. Experiments on simulated datasets and two real-world scenario datasets show better anomaly attribution performance of the proposed method compared to the baselines. Our method scales to larger graphs with more nodes and edges.

AAAI Conference 2023 Conference Paper

On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

  • Thanh Nguyen-Tang
  • Ming Yin
  • Sunil Gupta
  • Svetha Venkatesh
  • Raman Arora

Sample-efficient offline reinforcement learning (RL) with linear function approximation has been studied extensively recently. Much of the prior work has yielded instance-independent rates that hold even for the worst-case realization of problem instances. This work seeks to understand instance-dependent bounds for offline RL with linear function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of concentrability with respect to an optimal policy, the proposed algorithm yields a fast rate for offline RL when there is a positive gap in the optimal Q-value functions, even if the offline data were collected adaptively. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when the number of episodes exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first results that give a fast rate bound on the sub-optimality and an absolute zero sub-optimality bound for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.

NeurIPS Conference 2022 Conference Paper

Expected Improvement for Contextual Bandits

  • Hung Tran-The
  • Sunil Gupta
  • Santu Rana
  • Tuan Truong
  • Long Tran-Thanh
  • Svetha Venkatesh

The expected improvement (EI) is a popular technique to handle the tradeoff between exploration and exploitation under uncertainty. This technique has been widely used in Bayesian optimization but it is not applicable for the contextual bandit problem which is a generalization of the standard bandit and Bayesian optimization. In this paper, we initiate and study the EI technique for contextual bandits from both theoretical and practical perspectives. We propose two novel EI-based algorithms, one when the reward function is assumed to be linear and the other for more general reward functions. With linear reward functions, we demonstrate that our algorithm achieves a near-optimal regret. Notably, our regret improves that of LinTS \cite{agrawal13} by a factor $\sqrt{d}$ while avoiding to solve a NP-hard problem at each iteration as in LinUCB \cite{Abbasi11}. For more general reward functions which are modeled by deep neural networks, we prove that our algorithm achieves a $\tilde{\mathcal O} (\tilde{d}\sqrt{T})$ regret, where $\tilde{d}$ is the effective dimension of a neural tangent kernel (NTK) matrix, and $T$ is the number of iterations. Our experiments on various benchmark datasets show that both proposed algorithms work well and consistently outperform existing approaches, especially in high dimensions.

NeurIPS Conference 2022 Conference Paper

Learning to Constrain Policy Optimization with Virtual Trust Region

  • Thai Hung Le
  • Thommen Karimpanal George
  • Majid Abdolshah
  • Dung Nguyen
  • Kien Do
  • Sunil Gupta
  • Svetha Venkatesh

We introduce a constrained optimization method for policy gradient reinforcement learning, which uses two trust regions to regulate each policy update. In addition to using the proximity of one single old policy as the first trust region as done by prior works, we propose forming a second trust region by constructing another virtual policy that represents a wide range of past policies. We then enforce the new policy to stay closer to the virtual policy, which is beneficial if the old policy performs poorly. We propose a mechanism to automatically build the virtual policy from a memory buffer of past policies, providing a new capability for dynamically selecting appropriate trust regions during the optimization process. Our proposed method, dubbed Memory-Constrained Policy Optimization (MCPO), is examined in diverse environments, including robotic locomotion control, navigation with sparse rewards and Atari games, consistently demonstrating competitive performance against recent on-policy constrained policy gradient methods.

TMLR Journal 2022 Journal Article

On Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks in Besov Spaces

  • Thanh Nguyen-Tang
  • Sunil Gupta
  • Hung Tran-The
  • Svetha Venkatesh

Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation settings remain elusive. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $n = \tilde{\mathcal{O}}( H^{4 + 4 \frac{d}{\alpha}} \kappa_{\mu}^{1 + \frac{d}{\alpha}} \epsilon^{-2 - 2\frac{d}{\alpha}} )$ for offline RL with deep ReLU networks, where $\kappa_{\mu}$ is a measure of distributional shift, $H = (1-\gamma)^{-1}$ is the effective horizon length, $d$ is the dimension of the state-action space, $\alpha$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $\epsilon$ is a user-specified error. Notably, our sample complexity holds under two novel considerations: the Besov dynamic closure and the correlated structure. While the Besov dynamic closure subsumes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient in long (effective) horizon problems. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond the linearity regime in the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.

AAMAS Conference 2022 Conference Paper

Sympathy-based Reinforcement Learning Agents

  • Manisha Senadeera
  • Thommen George Karimpanal
  • Sunil Gupta
  • Santu Rana

As artificial agents become increasingly prevalent in our daily lives, it becomes imperative to equip them with an awareness of societal norms; specifically, the ability to account for and be considerate towards others they may cohabit with. In this work, we explore the ability for an agent trained through reinforcement learning to exhibit sympathetic behaviours towards another (independent) agent in the environment. We propose to achieve such behaviours by first inferring the reward function of the independent agent, through inverse reinforcement learning, and subsequently learning a policy based on a sympathetic reward function - a convex combination of the inferred rewards and the agent’s own rewards. The corresponding weighting is determined by a sympathy function which is computed based on the estimated return of the agent’s current action relative to that of all possible actions it could have taken. We evaluate our approach on adversarial as well as assistive environment settings, and demonstrate the ability of our sympathetic agent to perform well at its own goal, while simultaneously giving due consideration to another agent in its environment. We also empirically examine and report the sensitivity of our agent’s performance to the hyperparameters introduced in our proposed framework.

AAAI Conference 2022 Conference Paper

TRF: Learning Kernels with Tuned Random Features

  • Alistair Shilton
  • Sunil Gupta
  • Santu Rana
  • Arun Kumar Venkatesh
  • Svetha Venkatesh

Random Fourier features (RFF) are a popular set of tools for constructing low-dimensional approximations of translationinvariant kernels, allowing kernel methods to be scaled to big data. Apart from their computational advantages, by working in the spectral domain random Fourier features expose the translation invariant kernel as a density function that may, in principle, be manipulated directly to tune the kernel. In this paper we propose selecting the density function from a reproducing kernel Hilbert space to allow us to search the space of all translation-invariant kernels. Our approach, which we call tuned random features (TRF), achieves this by approximating the density function as the RKHS-norm regularised leastsquares best fit to an unknown “true” optimal density function, resulting in a RFF formulation where kernel selection is reduced to regularised risk minimisation with a novel regulariser. We derive bounds on the Rademacher complexity for our method showing that our random features approximation method converges to optimal kernel selection in the large N, D limit. Finally, we prove experimental results for a variety of real-world learning problems, demonstrating the performance of our approach compared to comparable methods.

AAAI Conference 2021 Conference Paper

Distributional Reinforcement Learning via Moment Matching

  • Thanh Nguyen-Tang
  • Sunil Gupta
  • Svetha Venkatesh

We consider the problem of learning a set of probability distributions from the empirical Bellman dynamics in distributional reinforcement learning (RL), a class of state-of-the-art methods that estimate the distribution, as opposed to only the expectation, of the total return. We formulate a method that learns a finite set of statistics from each return distribution via neural networks, as in the distributional RL literature. Existing distributional RL methods however constrain the learned statistics to predefined functional forms of the return distribution which is both restrictive in representation and difficult in maintaining the predefined statistics. Instead, we learn unrestricted statistics, i. e. , deterministic (pseudo-)samples, of the return distribution by leveraging a technique from hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simpler objective amenable to backpropagation. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. We establish sufficient conditions for the contraction of the distributional Bellman operator and provide finitesample analysis for the deterministic samples in distribution approximation. Experiments on the suite of Atari games show that our method outperforms the distributional RL baselines and sets a new record in the Atari games for non-distributed agents.

AAAI Conference 2021 Conference Paper

High Dimensional Level Set Estimation with Bayesian Neural Network

  • Huong Ha
  • Sunil Gupta
  • Santu Rana
  • Svetha Venkatesh

Level Set Estimation (LSE) is an important problem with applications in various fields such as material design, biotechnology, machine operational testing, etc. Existing techniques suffer from the scalability issue, that is, these methods do not work well with high dimensional inputs. This paper proposes novel methods to solve the high dimensional LSE problems using Bayesian Neural Networks. In particular, we consider two types of LSE problems: (1) explicit LSE problem where the threshold level is a fixed user-specified value, and, (2) implicit LSE problem where the threshold level is defined as a percentage of the (unknown) maximum of the objective function. For each problem, we derive the corresponding theoretic information based acquisition function to sample the data points so as to maximally increase the level set accuracy. Furthermore, we also analyse the theoretical time complexity of our proposed acquisition functions, and suggest a practical methodology to efficiently tune the network hyper-parameters to achieve high model accuracy. Numerical experiments on both synthetic and real-world datasets show that our proposed method can achieve better results compared to existing state-of-the-art approaches.

NeurIPS Conference 2021 Conference Paper

Kernel Functional Optimisation

  • Arun Kumar Anjanapura Venkatesh
  • Alistair Shilton
  • Santu Rana
  • Sunil Gupta
  • Svetha Venkatesh

Traditional methods for kernel selection rely on parametric kernel functions or a combination thereof and although the kernel hyperparameters are tuned, these methods often provide sub-optimal results due to the limitations induced by the parametric forms. In this paper, we propose a novel formulation for kernel selection using efficient Bayesian optimisation to find the best fitting non-parametric kernel. The kernel is expressed using a linear combination of functions sampled from a prior Gaussian Process (GP) defined by a hyperkernel. We also provide a mechanism to ensure the positive definiteness of the Gram matrix constructed using the resultant kernels. Our experimental results on GP regression and Support Vector Machine (SVM) classification tasks involving both synthetic functions and several real-world datasets show the superiority of our approach over the state-of-the-art.

AAAI Conference 2020 Conference Paper

Bayesian Optimization for Categorical and Category-Specific Continuous Inputs

  • Dang Nguyen
  • Sunil Gupta
  • Santu Rana
  • Alistair Shilton
  • Svetha Venkatesh

Many real-world functions are defined over both categorical and category-specific continuous variables and thus cannot be optimized by traditional Bayesian optimization (BO) methods. To optimize such functions, we propose a new method that formulates the problem as a multi-armed bandit problem, wherein each category corresponds to an arm with its reward distribution centered around the optimum of the objective function in continuous variables. Our goal is to identify the best arm and the maximizer of the corresponding continuous function simultaneously. Our algorithm uses a Thompson sampling scheme that helps connecting both multi-arm bandit and BO in a unified framework. We extend our method to batch BO to allow parallel optimization when multiple resources are available. We theoretically analyze our method for convergence and prove sub-linear regret bounds. We perform a variety of experiments: optimization of several benchmark functions, hyper-parameter tuning of a neural network, and automatic selection of the best machine learning model along with its optimal hyper-parameters (a. k. a automated machine learning). Comparisons with other methods demonstrate the effectiveness of our proposed method.

IJCAI Conference 2020 Conference Paper

Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation

  • Julian Berk
  • Sunil Gupta
  • Santu Rana
  • Svetha Venkatesh

In order to improve the performance of Bayesian optimisation, we develop a modified Gaussian process upper confidence bound (GP-UCB) acquisition function. This is done by sampling the exploration-exploitation trade-off parameter from a distribution. We prove that this allows the expected trade-off parameter to be altered to better suit the problem without compromising a bound on the function's Bayesian regret. We also provide results showing that our method achieves better performance than GP-UCB in a range of real-world and synthetic problems.

NeurIPS Conference 2020 Conference Paper

Sub-linear Regret Bounds for Bayesian Optimisation in Unknown Search Spaces

  • Hung Tran-The
  • Sunil Gupta
  • Santu Rana
  • Huong Ha
  • Svetha Venkatesh

Bayesian optimisation is a popular method for efficient optimisation of expensive black-box functions. Traditionally, BO assumes that the search space is known. However, in many problems, this assumption does not hold. To this end, we propose a novel BO algorithm which expands (and shifts) the search space over iterations based on controlling the expansion rate thought a \emph{hyperharmonic series}. Further, we propose another variant of our algorithm that scales to high dimensions. We show theoretically that for both our algorithms, the cumulative regret grows at sub-linear rates. Our experiments with synthetic and real-world optimisation tasks demonstrate the superiority of our algorithms over the current state-of-the-art methods for Bayesian optimisation in unknown search space.

AAAI Conference 2020 Conference Paper

Trading Convergence Rate with Computational Budget in High Dimensional Bayesian Optimization

  • Hung Tran-The
  • Sunil Gupta
  • Santu Rana
  • Svetha Venkatesh

Scaling Bayesian optimisation (BO) to high-dimensional search spaces is a active and open research problems particularly when no assumptions are made on function structure. The main reason is that at each iteration, BO requires to find global maximisation of acquisition function, which itself is a non-convex optimization problem in the original search space. With growing dimensions, the computational budget for this maximisation gets increasingly short leading to inaccurate solution of the maximisation. This inaccuracy adversely affects both the convergence and the efficiency of BO. We propose a novel approach where the acquisition function only requires maximisation on a discrete set of low dimensional subspaces embedded in the original highdimensional search space. Our method is free of any low dimensional structure assumption on the function unlike many recent high-dimensional BO methods. Optimising acquisition function in low dimensional subspaces allows our method to obtain accurate solutions within limited computational budget. We show that in spite of this convenience, our algorithm remains convergent. In particular, cumulative regret of our algorithm only grows sub-linearly with the number of iterations. More importantly, as evident from our regret bounds, our algorithm provides a way to trade the convergence rate with the number of subspaces used in the optimisation. Finally, when the number of subspaces is ”sufficiently large”, our algorithm’s cumulative regret is at most O∗ ( √ TγT ) as opposed to O∗ ( √ DTγT ) for the GP-UCB of Srinivas et al. (2012), reducing a crucial factor √ D where D being the dimensional number of input space. We perform empirical experiments to evaluate our method extensively, showing that its sample efficiency is better than the existing methods for many optimisation problems involving dimensions up to 5000.

AAAI Conference 2019 Conference Paper

Bayesian Functional Optimisation with Shape Prior

  • Pratibha Vellanki
  • Santu Rana
  • Sunil Gupta
  • David Rubin de Celis Leal
  • Alessandra Sutti
  • Murray Height
  • Svetha Venkatesh

Real world experiments are expensive, and thus it is important to reach a target in a minimum number of experiments. Experimental processes often involve control variables that change over time. Such problems can be formulated as functional optimisation problem. We develop a novel Bayesian optimisation framework for such functional optimisation of expensive black-box processes. We represent the control function using Bernstein polynomial basis and optimise in the coefficient space. We derive the theory and practice required to dynamically adjust the order of the polynomial degree, and show how prior information about shape can be integrated. We demonstrate the effectiveness of our approach for short polymer fibre design and optimising learning rate schedules for deep networks.

NeurIPS Conference 2019 Conference Paper

Bayesian Optimization with Unknown Search Space

  • Huong Ha
  • Santu Rana
  • Sunil Gupta
  • Thanh Nguyen
  • Hung Tran-The
  • Svetha Venkatesh

Applying Bayesian optimization in problems wherein the search space is unknown is challenging. To address this problem, we propose a systematic volume expansion strategy for the Bayesian optimization. We devise a strategy to guarantee that in iterative expansions of the search space, our method can find a point whose function value within epsilon of the objective function maximum. Without the need to specify any parameters, our algorithm automatically triggers a minimal expansion required iteratively. We derive analytic expressions for when to trigger the expansion and by how much to expand. We also provide theoretical analysis to show that our method achieves epsilon-accuracy after a finite number of iterations. We demonstrate our method on both benchmark test functions and machine learning hyper-parameter tuning tasks and demonstrate that our method outperforms baselines.

NeurIPS Conference 2019 Conference Paper

Multi-objective Bayesian optimisation with preferences over objectives

  • Majid Abdolshah
  • Alistair Shilton
  • Santu Rana
  • Sunil Gupta
  • Svetha Venkatesh

We present a multi-objective Bayesian optimisation algorithm that allows the user to express preference-order constraints on the objectives of the type objective A is more important than objective B. These preferences are defined based on the stability of the obtained solutions with respect to preferred objective functions. Rather than attempting to find a representative subset of the complete Pareto front, our algorithm selects those Pareto-optimal points that satisfy these constraints. We formulate a new acquisition function based on expected improvement in dominated hypervolume (EHI) to ensure that the subset of Pareto front satisfying the constraints is thoroughly explored. The hypervolume calculation is weighted by the probability of a point satisfying the constraints from a gradient Gaussian Process model. We demonstrate our algorithm on both synthetic and real-world problems.

NeurIPS Conference 2018 Conference Paper

Algorithmic Assurance: An Active Approach to Algorithmic Testing using Bayesian Optimisation

  • Shivapratap Gopakumar
  • Sunil Gupta
  • Santu Rana
  • Vu Nguyen
  • Svetha Venkatesh

We introduce algorithmic assurance, the problem of testing whether machine learning algorithms are conforming to their intended design goal. We address this problem by proposing an efficient framework for algorithmic testing. To provide assurance, we need to efficiently discover scenarios where an algorithm decision deviates maximally from its intended gold standard. We mathematically formulate this task as an optimisation problem of an expensive, black-box function. We use an active learning approach based on Bayesian optimisation to solve this optimisation problem. We extend this framework to algorithms with vector-valued outputs by making appropriate modification in Bayesian optimisation via the EXP3 algorithm. We theoretically analyse our methods for convergence. Using two real-world applications, we demonstrate the efficiency of our methods. The significance of our problem formulation and initial solutions is that it will serve as the foundation in assuring humans about machines making complex decisions.

JBHI Journal 2017 Journal Article

A Framework for Mixed-Type Multioutcome Prediction With Applications in Healthcare

  • Budhaditya Saha
  • Sunil Gupta
  • Dinh Phung
  • Svetha Venkatesh

Health analysis often involves prediction of multiple outcomes of mixed type. The existing work is restrictive to either a limited number or specific outcome types. We propose a framework for mixed-type multioutcome prediction. Our proposed framework proposes a cumulative loss function composed of a specific loss function for each outcome type-as an example, least square (continuous outcome), hinge (binary outcome), Poisson (count outcome), and exponential (nonnegative outcome). To model these outcomes jointly, we impose a commonality across the prediction parameters through a common matrix normal prior. The framework is formulated as iterative optimization problems and solved using an efficient block-coordinate descent method. We empirically demonstrate both scalability and convergence. We apply the proposed model to a synthetic dataset and then on two real-world cohorts: a cancer cohort and an acute myocardial infarction cohort collected over a two-year period. We predict multiple emergency-related outcomes-as example, future emergency presentations (binary), emergency admissions (count), emergency length of stay days (nonnegative), and emergency time to next admission day (nonnegative). We show that the predictive performance of the proposed model is better than several state-of-the-art baselines.

IJCAI Conference 2017 Conference Paper

High Dimensional Bayesian Optimization using Dropout

  • Cheng Li
  • Sunil Gupta
  • Santu Rana
  • Vu Nguyen
  • Svetha Venkatesh
  • Alistair Shilton

Scaling Bayesian optimization to high dimensions is challenging task as the global optimization of high-dimensional acquisition function can be expensive and often infeasible. Existing methods depend either on limited “active” variables or the additive form of the objective function. We propose a new method for high-dimensional Bayesian optimization, that uses a drop-out strategy to optimize only a subset of variables at each iteration. We derive theoretical bounds for the regret and show how it can inform the derivation of our algorithm. We demonstrate the efficacy of our algorithms for optimization on two benchmark functions and two real-world applications - training cascade classifiers and optimizing alloy composition.

NeurIPS Conference 2017 Conference Paper

Process-constrained batch Bayesian optimisation

  • Pratibha Vellanki
  • Santu Rana
  • Sunil Gupta
  • David Rubin
  • Alessandra Sutti
  • Thomas Dorin
  • Murray Height
  • Paul Sanders

Abstract Prevailing batch Bayesian optimisation methods allow all control variables to be freely altered at each iteration. Real-world experiments, however, often have physical limitations making it time-consuming to alter all settings for each recommendation in a batch. This gives rise to a unique problem in BO: in a recommended batch, a set of variables that are expensive to experimentally change need to be fixed, while the remaining control variables can be varied. We formulate this as a process-constrained batch Bayesian optimisation problem. We propose two algorithms, pc-BO(basic) and pc-BO(nested). pc-BO(basic) is simpler but lacks convergence guarantee. In contrast pc-BO(nested) is slightly more complex, but admits convergence analysis. We show that the regret of pc-BO(nested) is sublinear. We demonstrate the performance of both pc-BO(basic) and pc-BO(nested) by optimising benchmark test functions, tuning hyper-parameters of the SVM classifier, optimising the heat-treatment process for an Al-Sc alloy to achieve target hardness, and optimising the short polymer fibre production process.

ICRA Conference 1991 Conference Paper

Closed-loop control of manipulators with redundant joints using the Hamilton-Jacobi-Bellman equation

  • Sunil Gupta
  • J. Y. S. Luh

A closed-loop state feedback motion-control scheme has been developed for manipulators with redundant joints. The scheme is based on the well-known Hamilton-Jacobi-Bellman algorithm of optimal control. Self motion of the linkage is utilized for optimizing a quadratic performance criteria over the entire trajectory of the end-effector. A linearized model of the dynamics of the manipulator is used in the control algorithm. The algorithm requires solving the algebraic matrix Riccati equation. Simulation results are presented. >