Author name cluster

Sumana Basu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

AAAI Conference 2026 Conference Paper

Bootstrapping Personalized Insulin Therapy via Model-Based Reinforcement Learning: An In Silico Study

Sumana Basu
Flemming Kondrup
Adriana Romero-Soriano
Doina Precup

Personalized insulin therapy for individuals with Type 1 Diabetes via closed‑loop artificial pancreas systems requires rapid adaptation of dosing strategies to each patient's unique insulin response. However, learning patient‑specific policies from scratch demands extensive exploration, which is often impractical. In this work, we study a framework that integrates insulin-response-informed transfer learning with model-based reinforcement learning for insulin dosing. We first train an LSTM‑based insulin responsiveness predictor on virtual patients, using their glucose, insulin, and meal history to forecast future glucose levels. Analysis of insulin responsiveness of in-silico patients uncovers natural insulin‑response groups characterized by similar sensitivity and dynamics profiles. For a new patient, we identify a representative model from their response group and use it to generate synthetic trajectories. These trajectories are integrated into an enhanced H-step Deep Dyna-Q algorithm, enabling accelerated policy optimization through model-based planning. The dynamics model trained entirely in simulation achieves 91.31% accuracy in predicting blood glucose ranges on the Ohio Type 1 Diabetes dataset, indicating strong zero-shot generalization. Additionally, we find that bootstrapping a new patient with a physiologically-matched reference model accelerates convergence of effective dosing policies across in-silico cohorts of children, adolescents, and adults. These findings suggest that leveraging response-group-specific synthetic experience can expedite personalized insulin therapy, offering a promising pathway towards clinical validation.

PDF Details DOI

AAAI Conference 2023 Conference Paper

On the Challenges of Using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects

Sumana Basu
Marc-André Legault
Adriana Romero-Soriano
Doina Precup

Drug dosing is an important application of AI, which can be formulated as a Reinforcement Learning (RL) problem. In this paper, we identify two major challenges of using RL for drug dosing: delayed and prolonged effects of administering medications, which break the Markov assumption of the RL framework. We focus on prolongedness and define PAE-POMDP (Prolonged Action Effect-Partially Observable Markov Decision Process), a subclass of POMDPs in which the Markov assumption does not hold specifically due to prolonged effects of actions. Motivated by the pharmacology literature, we propose a simple and effective approach to converting drug dosing PAE-POMDPs into MDPs, enabling the use of the existing RL algorithms to solve such problems. We validate the proposed approach on a toy task, and a challenging glucose control task, for which we devise a clinically-inspired reward function. Our results demonstrate that: (1) the proposed method to restore the Markov assumption leads to significant improvements over a vanilla baseline; (2) the approach is competitive with recurrent policies which may inherently capture the prolonged affect of actions; (3) it is remarkably more time and memory efficient than the recurrent baseline and hence more suitable for real-time dosing control systems; and (4) it exhibits favourable qualitative behavior in our policy analysis.

PDF Details DOI

RLDM Conference 2019 Conference Abstract

Temporal Abstraction in Cooperative Multi-Agent Systems

Jhelum Chakravorty
Sumana Basu
Doina Precup

In this work we introduce temporal abstraction in cooperative multi-agent systems (or teams), which are essentially decentralized Markov Decision processes (Dec-MDPs) or dec. Partially Observable MDPs (Dec-POMDPs). We believe that as in the case of single-agent systems, option framework gives rise to faster convergence to the optimal value, thus facilitating transfer learning. The decentralized nature of dynamic teams leads to curse of dimensionality which impedes scalability. The partial observability requires minute analysis of the information structure involving private and public or common knowledge. The POMDP structure entails growing history of agents’ observations and actions that leads to intractability. This calls for proper design of belief to circumvent such a growing history by leveraging Bayesian update, consequently requiring judicious choice of Bayesian inference to approximate the posterior. Moreover, in the temporal abstraction, the option-policies of the agents have stochastic termination, which adds to intricacies in the hierarchical reinforcement learning problem. We study both planning and learning in the team option-critic framework. We propose Distributed Option Critic (DOC) algorithm, where we leverage the notion of common information approach and distributed policy gradient. We employ the former to formulate a centralized (coordinated) system equivalent to the original decentralized system and to define the belief for the coordinated system. The latter is exploited in DOC for policy improvements of independent agents. We assume that there is a fictitious coordinator who observes the information shared by all agents, updates a belief on the joint-states in a Bayesian manner, chooses options and whispers them to the agents. The agents in turn use their private information to choose actions pertaining to the option assigned to it. Finally, the option-value of the cooperative game is learnt using distributed option-critic architecture.

PDF Details