Author name cluster

Balaraman Ravindran

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

48 papers

2 author rows

AAAI Conference 2026 Conference Paper

SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories

Returaj Burnwal
Nirav Pravinbhai Bhatt
Balaraman Ravindran

In this work, we study the problem of offline safe imitation learning (IL). In many real-world settings, online interactions can be risky, and accurately specifying the reward and the safety cost information at each timestep can be difficult. However, it is often feasible to collect trajectories reflecting undesirable or risky behavior, implicitly conveying the behavior the agent should avoid. We refer to these trajectories as non-preferred trajectories. Unlike standard IL, which aims to mimic demonstrations, our agent must also learn to avoid risky behavior using non-preferred trajectories. In this paper, we propose a novel approach, SafeMIL, to learn a parameterized cost that predicts if the state-action pair is risky via Multiple Instance Learning. The learned cost is then used to avoid non-preferred behaviors, resulting in a policy that prioritizes safety. We empirically demonstrate that our approach can learn a safer policy that satisfies cost constraints without degrading the reward performance, thereby outperforming several baselines.

PDF Details DOI

AAMAS Conference 2024 Conference Paper

MABL: Bi-Level Latent-Variable World Model for Sample-Efficient Multi-Agent Reinforcement Learning

Aravind Venugopal
Stephanie Milani
Fei Fang
Balaraman Ravindran

Multi-agent reinforcement learning (MARL) methods often suffer from high sample complexity, limiting their use in real-world problems where data is sparse or expensive to collect. Although latent-variable world models have been employed to address this issue by generating abundant synthetic data for MARL training, most of these models cannot encode vital global information available during training into their latent states, which hampers learning efficiency. The few exceptions that incorporate global information assume centralized execution of their learned policies, which is impractical in many applications with partial observability. We propose a novel model-based MARL algorithm, MABL (Multi- Agent Bi-Level world model), that learns a bi-level latent-variable world model from high-dimensional inputs. Unlike existing models, MABL is capable of encoding essential global information into the latent states during training while guaranteeing the decentralized execution of learned policies. For each agent, MABL learns a global latent state at the upper level, which is used to inform the learning of an agent latent state at the lower level. During execution, agents exclusively use lower-level latent states and act independently. Crucially, MABL can be combined with any model-free MARL algorithm for policy learning. In our empirical evaluation with complex discrete and continuous multi-agent tasks including SMAC, Flatland, and MAMuJoCo, MABL surpasses SOTA multiagent latent-variable world models in both sample efficiency and overall performance.

PDF

AAMAS Conference 2023 Conference Paper

Matching Options to Tasks using Option-Indexed Hierarchical Reinforcement Learning

Kushal Chauhan
Soumya Chatterjee
Akash Reddy
Aniruddha S
Balaraman Ravindran
Pradeep Shenoy

The options framework in Hierarchical Reinforcement Learning breaks down overall goals into a combination of simpler tasks (options) and their policies, allowing for abstraction in the action space. Ideally, options can be reused across different goals; indeed, this is necessary to build a continual learning agent that can effectively leverage its prior experience. Previous approaches allow limited transfer of pre-learned options to new task settings. We propose a novel option indexing approach to hierarchical learning (OI-HRL), where we learn an affinity function between options and items present in the environment. With OI-HRL, we effectively reuse a large library of pre-trained options in zero-shot generalization at test time by restricting goal-directed learning to relevant options alone. We develop a meta-training loop that learns the representations of options and environments over a series of HRL problems by incorporating feedback about the relevance of retrieved options to the higher-level goal. Our model is competitive with oracular baselines and substantially better than a baseline with the entire option pool available for learning the hierarchical policy.

PDF

TMLR Journal 2022 Journal Article

Benchmarking and Analyzing Unsupervised Network Representation Learning and the Illusion of Progress

Saket Gurukar
Priyesh Vijayan
Srinivasan Parthasarathy
Balaraman Ravindran
Aakash Srinivasan
Goonmeet Bajaj
Chen Cai
Moniba Keymanesh

A number of methods have been developed for unsupervised network representation learning -- ranging from classical methods based on the graph spectra to recent random walk based methods and from deep learning based methods to matrix factorization based methods. Each new study inevitably seeks to establish the relative superiority of the proposed method over others. The lack of a standard assessment protocol and benchmark suite often leave practitioners wondering if a new idea represents a significant scientific advance. In this work, we articulate a clear and pressing need to systematically and rigorously benchmark such methods. Our overall assessment -- a result of a careful benchmarking of 15 methods for unsupervised network representation learning on 16 non-attributed graphs (several with different characteristics) - is that many recently proposed improvements are somewhat of an illusion when assessed through the lens of downstream tasks such as link prediction and node classification. Specifically, we find that several proposed improvements are marginal at best and that aspects of many of these datasets often render such small differences insignificant, especially when viewed from a rigorous statistical lens. A more detailed analysis of our results identify several new insights: first, we find that classical methods, often dismissed or not considered by recent efforts, can compete on certain types of datasets if they are tuned appropriately; second, we find that from a qualitative standpoint, a couple of methods based on matrix factorization offer a small but not always consistent advantage over alternative methods; third, no single method completely outperforms other embedding methods on both node classification and link prediction tasks. Finally, we also present several analysis that reveals settings under which certain algorithms perform well (e.g., the role of neighborhood context and dataset properties that impact performance). An important outcome of this study is the benchmark and evaluation protocol, which practitioners may find useful for future research in this area.