Author name cluster

Susobhan Ghosh

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

IJCAI Conference 2024 Conference Paper

ReBandit: Random Effects Based Online RL Algorithm for Reducing Cannabis Use

Susobhan Ghosh
Yongyi Guo
Pei-Yao Hung
Lara Coughlin
Erin Bonar
Inbal Nahum-Shani
Maureen Walton
Susan Murphy

The escalating prevalence of cannabis use, and associated cannabis-use disorder (CUD), poses a significant public health challenge globally. With a notably wide treatment gap, especially among emerging adults (EAs; ages 18-25), addressing cannabis use and CUD remains a pivotal objective within the 2030 United Nations Agenda for Sustainable Development Goals (SDG). In this work, we develop an online reinforcement learning (RL) algorithm called reBandit which will be utilized in a mobile health study to deliver personalized mobile health interventions aimed at reducing cannabis use among EAs. reBandit utilizes random effects and informative Bayesian priors to learn quickly and efficiently in noisy mobile health environments. Moreover, reBandit employs Empirical Bayes and optimization techniques to autonomously update its hyper-parameters online. To evaluate the performance of our algorithm, we construct a simulation testbed using data from a prior study, and compare against commonly used algorithms in mobile health studies. We show that reBandit performs equally well or better than all the baseline algorithms, and the performance gap widens as population heterogeneity increases in the simulation environment, proving its adeptness to adapt to diverse population of study participants.

PDF Details DOI

AAMAS Conference 2023 Conference Paper

Fairness for Workers Who Pull the Arms: An Index Based Policy for Allocation of Restless Bandit Tasks

Arpita Biswas
Jackson A. Killian
Paula Rodriguez Diaz
Susobhan Ghosh
Milind Tambe

Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an interventiondependent Markov Decision Process. However, the existing literature assumes all intervention resources belong to a single uniform pool, limiting their applicability to real-world settings where interventions are carried out by a set of workers, each with their own costs, budgets, and intervention effects. In this work, we consider a novel RMAB setting, called multi-worker restless bandits (MWRMAB) with heterogeneous workers. The goal is to plan an intervention schedule that maximizes the expected reward while satisfying budget constraints on each worker as well as fairness in terms of the load assigned to each worker. Our contributions are two-fold: (1) we provide a multi-worker extension of the Whittle index to tackle heterogeneous costs and per-worker budget and (2) we develop an index-based scheduling policy to achieve fairness. Further, we evaluate our method on various cost structures and show that our method significantly outperforms other baselines in terms of fairness without sacrificing much in reward accumulated.

PDF

AAMAS Conference 2022 Conference Paper

Efficient Algorithms for Finite Horizon and Streaming Restless Multi-Armed Bandit Problems

Aditya S. Mate
Arpita Biswas
Christoph Siebenbrunner
Susobhan Ghosh
Milind Tambe

We propose Streaming Bandits, a Restless Multi-Armed Bandit (RMAB) framework in which heterogeneous arms may arrive and leave the system after staying on for a finite lifetime. Streaming Bandits naturally capture the health-intervention planning problem, where health workers must manage the health outcomes of a patient cohort while new patients join and existing patients leave the cohort each day. Our contributions are as follows: (1) We derive conditions under which our problem satisfies indexability, a precondition that guarantees the existence and asymptotic optimality of the Whittle Index solution for RMABs. We establish the conditions using a polytime reduction of the Streaming Bandit setup to regular RMABs. (2) We further prove a phenomenon that we call index decay — whereby the Whittle index values are low for short residual lifetimes — driving the intuition underpinning our algorithm. (3) We propose a novel and efficient algorithm to compute the index-based solution for Streaming Bandits. Unlike previous methods, our algorithm does not rely on solving the costly finite horizon problem on each arm of the RMAB, thereby lowering the computational complexity compared to existing methods. (4) Finally, we evaluate our approach via simulations run on real-world data sets from a tuberculosis patient monitoring task and an intervention planning task for improving maternal healthcare, in addition to other synthetic domains. Across the board, our algorithm achieves a 2-orders-of-magnitude speed-up over existing methods while maintaining the same solution quality. The full paper is available at: https: //arxiv. org/pdf/2103. 04730. pdf

PDF

AAAI Conference 2020 Conference Paper

Bidding in Smart Grid PDAs: Theory, Analysis and Strategy

Susobhan Ghosh
Sujit Gujar
Praveen Paruchuri
Easwar Subramanian
Sanjay Bhat

Periodic Double Auctions (PDAs) are commonly used in the real world for trading, e. g. in stock markets to determine stock opening prices, and energy markets to trade energy in order to balance net demand in smart grids, involving trillions of dollars in the process. A bidder, participating in such PDAs, has to plan for bids in the current auction as well as for the future auctions, which highlights the necessity of good bidding strategies. In this paper, we perform an equilibrium analysis of single unit single-shot double auctions with a certain clearing price and payment rule, which we refer to as ACPR, and ﬁnd it intractable to analyze as number of participating agents increase. We further derive the best response for a bidder with complete information in a single-shot double auction with ACPR. Leveraging the theory developed for single-shot double auction and taking the PowerTAC wholesale market PDA as our testbed, we proceed by modeling the PDA of PowerTAC as an MDP. We propose a novel bidding strategy, namely MDPLCPBS. We empirically show that MDPLCPBS follows the equilibrium strategy for double auctions that we previously analyze. In addition, we benchmark our strategy against the baseline and the state-of-the-art bidding strategies for the PowerTAC wholesale market PDAs, and show that MDPLCPBS outperforms most of them consistently.

PDF Details

AAAI Conference 2019 Conference Paper

VidyutVanika: A Reinforcement Learning Based Broker Agent for a Power Trading Competition

Susobhan Ghosh
Easwar Subramanian
Sanjay P. Bhat
Sujit Gujar
Praveen Paruchuri

A smart grid is an efficient and sustainable energy system that integrates diverse generation entities, distributed storage capacity, and smart appliances and buildings. A smart grid brings new kinds of participants in the energy market served by it, whose effect on the grid can only be determined through high fidelity simulations. Power TAC offers one such simulation platform using real-world weather data and complex state-of-the-art customer models. In Power TAC, autonomous energy brokers compete to make profits across tariff, wholesale and balancing markets while maintaining the stability of the grid. In this paper, we design an autonomous broker VidyutVanika, the runner-up in the 2018 Power TAC competition. VidyutVanika relies on reinforcement learning (RL) in the tariff market and dynamic programming in the wholesale market to solve modified versions of known Markov Decision Process (MDP) formulations in the respective markets. The novelty lies in defining the reward functions for MDPs, solving these MDPs, and the application of these solutions to real actions in the market. Unlike previous participating agents, VidyutVanika uses a neural network to predict the energy consumption of various customers using weather data. We use several heuristic ideas to bridge the gap between the restricted action spaces of the MDPs and the much more extensive action space available to VidyutVanika. These heuristics allow VidyutVanika to convert near-optimal fixed tariffs to time-of-use tariffs aimed at mitigating transmission capacity fees, spread out its orders across several auctions in the wholesale market to procure energy at a lower price, more accurately estimate parameters required for implementing the MDP solution in the wholesale market, and account for wholesale procurement costs while optimizing tariffs. We use Power TAC 2018 tournament data and controlled experiments to analyze the performance of VidyutVanika, and illustrate the efficacy of the above strategies.

PDF Details