Arrow Research search

Author name cluster

Jayakumar Subramanian

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

ICLR Conference 2025 Conference Paper

Measuring And Improving Engagement of Text-to-Image Generation Models

  • Varun Khurana
  • Yaman Kumar Singla
  • Jayakumar Subramanian
  • Changyou Chen
  • Rajiv Ratn Shah
  • Zhiqiang Xu
  • Balaji Krishnamurthy

Recent advances in text-to-image generation have achieved impressive aesthetic quality, making these models usable for both personal and commercial purposes. However, in the fields of marketing and advertising, images are often created to be more engaging, as reflected in user behaviors such as increasing clicks, likes, and purchases, in addition to being aesthetically pleasing. To this end, we introduce the challenge of optimizing the image generation process for improved viewer engagement. In order to study image engagement and utility in real-world marketing scenarios, we collect *EngagingImageNet*, the first large-scale dataset of images, along with associated user engagement metrics. Further, we find that existing image evaluation metrics like aesthetics, CLIPScore, PickScore, ImageReward, *etc.* are unable to capture viewer engagement. To address the lack of reliable metrics for assessing image utility, we use the *EngagingImageNet* dataset to train *EngageNet*, an engagement-aware Vision Language Model (VLM) that predicts viewer engagement of images by leveraging contextual information about the tweet content, enterprise details, and posting time. We then explore methods to enhance the engagement of text-to-image models, making initial strides in this direction. These include conditioning image generation on improved prompts, supervised fine-tuning of stable diffusion on high-performing images, and reinforcement learning to align stable diffusion with *EngageNet*-based reward signals, all of which lead to the generation of images with higher viewer engagement. Finally, we propose the *Engagement Arena*, to benchmark text-to-image models based on their ability to generate engaging images, using *EngageNet* as the evaluator, thereby encouraging the research community to measure further advances in the engagement of text-to-image modeling. These contributions provide a new pathway for advancing utility-driven image generation, with significant implications for the commercial application of image generation. We have released our code and dataset on [behavior-in-the-wild.github.io/image-engagement](https://behavior-in-the-wild.github.io/image-engagement).

NeurIPS Conference 2025 Conference Paper

Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization

  • Subhojyoti Mukherjee
  • Viet Lai
  • Raghavendra Addanki
  • Ryan Rossi
  • Seunghyun Yoon
  • Trung Bui
  • Anup B. Rao
  • Jayakumar Subramanian

Offline reinforcement learning (RL) is a variant of RL where the policy is learned from a previously collected dataset of trajectories and rewards. In our work, we propose a practical approach to offline RL with large language models (LLMs). We recast the problem as reward-weighted fine-tuning, which can be solved using similar techniques to supervised fine-tuning (SFT). To showcase the value of our approach, we apply it to learning short-horizon question-answering policies of a fixed length, where the agent reasons about potential answers or asks clarifying questions. Our work stands in a stark contrast to state-of-the-art methods in this domain, based on SFT and direct preference optimization, which have additional hyper-parameters and do not directly optimize for rewards. We compare to them empirically, and report major gains in both optimized rewards and language quality.

AAMAS Conference 2024 Conference Paper

flame: A F ramework for L earning in A gent-based M od E ls

  • Ayush Chopra
  • Jayakumar Subramanian
  • Balaji Krishnamurthy
  • Ramesh Raskar

Agent-based models (ABMs) are discrete simulators comprising agents that act and interact in a computational world. Despite wide applicability, infrastructure for ABMs has been fragmented and lacks a standard framework to integrate benefits of recent computing advances, especially in machine learning and automatic differentiation (autograd). To alleviate this gap we introduce flame: a framework to define, simulate and optimize differentiable agentbased models. First, flame introduces a domain-specific language that describes ABMs with stochastic dynamics across several domains and can be implemented using abstractions of autograd. Second, flame models can execute simulations on GPU, process millions of interactions per second and seamlessly scale from few hundred agents to million-size populations. Third, flame provides custom utilities to implement fully differentiable ABMs which can benefit from gradient-based learning and integrate with deep neural networks (DNNs), in several ways. Specifically, ABMs can now use supervised and reinforcement learning to calibrate simulation parameters, optimize agent actions and learn expressive interaction rules. Finally, flame is easily accessible with a simple Python API. We validate flame through multiple case studies that study tissue morphogenesis over bio-electric networks, infectious disease epidemiology over physical networks and opinion dynamics over social networks. We hope flame can ignite further innovation at the intersection of AI and ABMs. Our code is here.

AAMAS Conference 2023 Conference Paper

Differentiable Agent-based Epidemiology

  • Ayush Chopra
  • Alexander Rodríguez
  • Jayakumar Subramanian
  • Arnau Quera-Bofarull
  • Balaji Krishnamurthy
  • B. Aditya Prakash
  • Ramesh Raskar

Mechanistic simulators are an indispensable tool for epidemiology to explore the behavior of complex, dynamic infections under varying conditions and navigate uncertain environments. Agent-based models (ABMs) are an increasingly popular simulation paradigm that can represent the heterogeneity of contact interactions with granular detail and agency of individual behavior. However, conventional ABM frameworks not differentiable and present challenges in scalability; due to which it is non-trivial to connect them to auxiliary data sources. In this paper, we introduce GradABM: a scalable, differentiable design for agent-based modeling that is amenable to gradient-based learning with automatic differentiation. GradABM can quickly simulate million-size populations in few seconds on commodity hardware, integrate with deep neural networks and ingest heterogeneous data sources. This provides an array of practical benefits for calibration, forecasting, and evaluating policy interventions. We demonstrate the efficacy of GradABM via extensive experiments with real COVID-19 and influenza datasets.

ICLR Conference 2023 Conference Paper

Explaining RL Decisions with Trajectories

  • Shripad Vilasrao Deshmukh
  • Arpan Dasgupta
  • Balaji Krishnamurthy
  • Nan Jiang
  • Chirag Agarwal
  • Georgios Theocharous
  • Jayakumar Subramanian

Explanation is a key component for the adoption of reinforcement learning (RL) in many real-world decision-making problems. In the literature, the explanation is often provided by saliency attribution to the features of the RL agent's state. In this work, we propose a complementary approach to these explanations, particularly for offline RL, where we attribute the policy decisions of a trained RL agent to the trajectories encountered by it during training. To do so, we encode trajectories in offline training data individually as well as collectively (encoding a set of trajectories). We then attribute policy decisions to a set of trajectories in this encoded space by estimating the sensitivity of the decision with respect to that set. Further, we demonstrate the effectiveness of the proposed approach in terms of quality of attributions as well as practical scalability in diverse environments that involve both discrete and continuous state and action spaces such as grid-worlds, video games (Atari) and continuous control (MuJoCo). We also conduct a human study on a simple navigation task to observe how their understanding of the task compares with data attributed for a trained RL policy.

JMLR Journal 2022 Journal Article

Approximate Information State for Approximate Planning and Reinforcement Learning in Partially Observed Systems

  • Jayakumar Subramanian
  • Amit Sinha
  • Raihan Seraj
  • Aditya Mahajan

We propose a theoretical framework for approximate planning and learning in partially observed systems. Our framework is based on the fundamental notion of information state. We provide two definitions of information state---i) a function of history which is sufficient to compute the expected reward and predict its next value; ii) a function of the history which can be recursively updated and is sufficient to compute the expected reward and predict the next observation. An information state always leads to a dynamic programming decomposition. Our key result is to show that if a function of the history (called AIS) approximately satisfies the properties of the information state, then there is a corresponding approximate dynamic program. We show that the policy computed using this is approximately optimal with bounded loss of optimality. We show that several approximations in state, observation and action spaces in literature can be viewed as instances of AIS. In some of these cases, we obtain tighter bounds. A salient feature of AIS is that it can be learnt from data. We present AIS based multi-time scale policy gradient algorithms and detailed numerical experiments with low, moderate and high dimensional environments. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2022. ( edit, beta )

AAMAS Conference 2022 Conference Paper

Status-quo Policy Gradient in Multi-Agent Reinforcement Learning

  • Pinkesh Badjatiya
  • Mausoom Sarkar
  • Nikaash Puri
  • Jayakumar Subramanian
  • Abhishek Sinha
  • Siddharth Singh
  • Balaji Krishnamurthy

Individual rationality, which involves maximizing expected individual returns, does not always lead to high-utility individual or group outcomes in multi-agent problems. For instance, in multi-agent social dilemmas, Reinforcement Learning (RL) agents trained to maximize individual rewards converge to a low-utility mutually harmful equilibrium. In contrast, humans evolve useful strategies in such social dilemmas. Inspired by ideas from human psychology that attribute this behavior to the status-quo bias, we present a status-quo loss (𝑆𝑄𝐿𝑜𝑠𝑠) and the corresponding policy gradient algorithm that incorporates this bias in an RL agent. We demonstrate that agents trained with 𝑆𝑄𝐿𝑜𝑠𝑠 learn high-utility policies in several social dilemma matrix games (Prisoner’s Dilemma, Matching Pennies, Chicken Game). To apply SQLoss to visual input games where cooperation and defection are determined by a sequence of lower-level actions, we present GameDistill, an algorithm that reduces a visual input game to a matrix game. We empirically show how agents trained with SQLoss on GameDistill reduced versions of Coin Game and Stag Hunt learn high-utility policies. Finally, we show that 𝑆𝑄𝐿𝑜𝑠𝑠 extends to a 4-agent setting by demonstrating the emergence of cooperative behavior in the popular Braess’ paradox.

NeurIPS Conference 2021 Conference Paper

Medical Dead-ends and Learning to Identify High-Risk States and Treatments

  • Mehdi Fatemi
  • Taylor W. Killian
  • Jayakumar Subramanian
  • Marzyeh Ghassemi

Machine learning has successfully framed many sequential decision making problems as either supervised prediction, or optimal decision-making policy identification via reinforcement learning. In data-constrained offline settings, both approaches may fail as they assume fully optimal behavior or rely on exploring alternatives that may not exist. We introduce an inherently different approach that identifies "dead-ends" of a state space. We focus on patient condition in the intensive care unit, where a "medical dead-end" indicates that a patient will expire, regardless of all potential future treatment sequences. We postulate "treatment security" as avoiding treatments with probability proportional to their chance of leading to dead-ends, present a formal proof, and frame discovery as an RL problem. We then train three independent deep neural models for automated state construction, dead-end discovery and confirmation. Our empirical results discover that dead-ends exist in real clinical data among septic patients, and further reveal gaps between secure treatments and those administered.

RLDM Conference 2019 Conference Abstract

Approximate information state for partially observed systems

  • Jayakumar Subramanian
  • Aditya Mahajan

The standard approach for modeling partially observed systems is to model them as partially observable Markov decision processes (POMDPs) and obtain a dynamic program in terms of a belief state. The belief state formulation works well for planning but is not ideal for learning because the belief state depends on the model and, as such, is not observable when the model is unknown. In this paper, we present an alternative notion of an information state for obtaining a dynamic program in partially observed models. In particular, an information state is a sufficient statistic for the current reward which evolves in a controlled Markov manner. We show that such an information state leads to a dynamic programming decomposition. Then we present a notion of an approximate information state and present an approximate dynamic program based on the approximate information state. Approximate information state is defined in terms of properties that can be estimated using sampled trajectories. Therefore, they provide a constructive method for reinforcement learning in partially observed systems. We present one such construction and show that it performs better than the state of the art for three benchmark models.

RLDM Conference 2019 Conference Abstract

Reinforcement learning for mean-field teams

  • Jayakumar Subramanian
  • Raihan Seraj
  • Aditya Mahajan

We develop reinforcement learning (RL) algorithms for a class of multi-agent systems called mean-field teams (MFT). Teams are multi-agent systems where agents have a common goal and receive a common reward at each time step. The team objective is to maximize the expected cumulative discounted reward over an infinite horizon. MFTs are teams with homogeneous, anonymous agents such that the agents are coupled in their dynamics and rewards through the mean-field (i. e. , empirical distribution of the agents’ state). In our work, we consider MFTs with a mean-field sharing information structure, i. e. , each agent knows its local state and the empirical mean-field at each time step. We obtain a dynamic programming (DP) decomposition for MFTs using a decomposition approach from literature called the common information approach, which splits the decision making process into two parts. The first part is a centralized coordination rule that yields the second part, which are prescriptions to be followed by each agent based on their local information. We develop an RL approach for MFTs under the assumption of parametrized prescriptions. We consider the parameters as actions and use conventional RL algorithms to solve the DP. We illustrate the use of these algorithms through two examples based on stylized models of the demand response problem in smart grids and malware spread in networks.

AAMAS Conference 2019 Conference Paper

Reinforcement Learning in Stationary Mean-field Games

  • Jayakumar Subramanian
  • Aditya Mahajan

Multi-agent reinforcement learning has made significant progress in recent years, but it remains a hard problem. Hence, one often resorts to developing learning algorithms for specific classes of multi-agent systems. In this paper we study reinforcement learning in a specific class of multi-agent systems systems called mean-field games. In particular, we consider learning in stationary mean-field games. We identify two different solution concepts—stationary mean-field equilibrium and stationary mean-field social-welfare optimal policy—for such games based on whether the agents are non-cooperative or cooperative, respectively. We then generalize these solution concepts to their local variants using bounded rationality based arguments. For these two local solution concepts, we present two reinforcement learning algorithms. We show that the algorithms converge to the right solution under mild technical conditions and demonstrate this using two numerical examples.