Arrow Research search

Author name cluster

Rohan Chandra

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

19 papers
2 author rows

Possible papers

19

AAAI Conference 2026 Conference Paper

Towards Agents That Exhibit Human-Like Autonomy in Complex Environments

  • Rohan Chandra

Deploying intelligent, autonomous agents e.g. autonomous vehicles and robots, in the real world has been a longstanding goal in robotics and artificial intelligence (AI). We have already begun to witness the emergence of vacuum robots in our homes, service robots in warehouses, and even self-driving cars on our way to work. These environments are often dense, constrained, and unstructured, with heterogeneous agents, each with their own unique behaviors and objectives. While agents today are designed to navigate these environments safely, their overly conservative nature often leads to slow and jerky motion (frequent stopping and freezing), lack of social compliance (not giving way to other people, blocking doorways and intersection), and poor adaptability across diverse complex environments (failure due to sudden accidents e.g. liquid spills). In other words, these robots often fail to capture the essence of human-like autonomy, which involves the ability to take calculated risks, even in complex environments. In this talk, I will describe my vision for a paradigm shift in the way intelligent physical agents navigate highly dense, heterogeneous, constrained, and unstructured environments using human-like autonomy.

ICRA Conference 2025 Conference Paper

Decentralized Safe and Scalable Multi-Agent Control Under Limited Actuation

  • Vrushabh Zinage
  • Abhishek Jha
  • Rohan Chandra
  • Efstathios Bakolas

To deploy safe and agile robots in cluttered environments, there is a need to develop fully decentralized controllers that guarantee safety, respect actuation limits, prevent deadlocks, and scale to thousands of agents. Current approaches fall short of meeting all these goals: optimization-based methods ensure safety but lack scalability, while learning-based methods scale but do not guarantee safety. We propose a novel algorithm to achieve safe and scalable control for multiple agents under limited actuation. Specifically, our approach includes: ( $i$ ) learning a decentralized neural Integral Control Barrier function (neural ICBF) for scalable, input-constrained control, (ii) embedding a lightweight decentralized Model Predictive Control-based Integral Control Barrier Function (MPC-ICBF) into the neural network policy to ensure safety while maintaining scalability, and (iii) introducing a novel method to minimize deadlocks based on gradient-based optimization techniques from machine learning to address local minima in deadlocks. Our numerical simulations show that this approach outperforms state-of-the-art multi-agent control algorithms in terms of safety, input constraint satisfaction, and minimizing deadlocks. Additionally, we demonstrate strong generalization across scenarios with varying agent counts, scaling up to 1000 agents. Videos and co de are available at https://maicbf.github.io/.

AAAI Conference 2025 Conference Paper

Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes

  • Chen Tang
  • Ben Abbatematteo
  • Jiaheng Hu
  • Rohan Chandra
  • Roberto Martín-Martín
  • Peter Stone

Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. These challenges notwithstanding, recent advances have enabled DRL to succeed at some real-world robotic tasks. However, state-of-the-art DRL solutions’ maturity varies significantly across robotic applications. In this talk, I will review the current progress of DRL in real-world robotic applications based on our recent survey paper (with Tang, Abbatematteo, Hu, Chandra, and Martı́n-Martı́n), with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies, including locomotion, navigation, stationary manipulation, mobile manipulation, human-robot interaction, and multi-robot interaction. The analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. I will also highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms, holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks, and principled development and evaluation procedures. The talk is designed to offer insights for RL practitioners and roboticists toward harnessing RL’s power to create generally capable real-world robotic systems.

NeurIPS Conference 2025 Conference Paper

Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features

  • Zixuan Xie
  • Xinyu Liu
  • Rohan Chandra
  • Shangtong Zhang

Linear TD($\lambda$) is one of the most fundamental reinforcement learning algorithms for policy evaluation. Previously, convergence rates are typically established under the assumption of linearly independent features, which does not hold in many practical scenarios. This paper instead establishes the first $L^2$ convergence rates for linear TD($\lambda$) operating under arbitrary features, without making any algorithmic modification or additional assumptions. Our results apply to both the discounted and average-reward settings. To address the potential non-uniqueness of solutions resulting from arbitrary features, we develop a novel stochastic approximation result featuring convergence rates to the solution set instead of a single point.

IROS Conference 2025 Conference Paper

Multi-Agent Inverse Reinforcement Learning in Real World Unstructured Pedestrian Crowds

  • Rohan Chandra
  • Haresh Karnan
  • Negar Mehr
  • Peter Stone 0001
  • Joydeep Biswas

Social robot navigation in crowded public spaces such as university campuses, restaurants, grocery stores, and hospitals, is an increasingly important area of research. One of the core strategies for achieving this goal is to understand humans’ intent–underlying psychological factors that govern their motion–by learning how humans assign rewards to their actions, typically via inverse reinforcement learning (IRL). Despite significant progress in IRL, learning reward functions of multiple agents simultaneously in dense unstructured pedestrian crowds has remained intractable due to the nature of the tightly coupled social interactions that occur in these scenarios e. g. passing, intersections, swerving, weaving, etc. In this paper, we present a new multi-agent maximum entropy inverse reinforcement learning algorithm for real world unstructured pedestrian crowds. Key to our approach is a simple, but effective, mathematical trick which we name the so-called "tractability-rationality trade-off" trick that achieves tractability at the cost of a slight reduction in accuracy. We compare our approach to the classical single-agent MaxEnt IRL as well as state-of-the-art trajectory prediction methods on several datasets including the ETH, UCY, SCAND, JRDB, and a new dataset, called Speedway, collected at a busy intersection on a University campus focusing on dense, complex agent interactions. Our key findings show that, on the dense Speedway dataset, our approach ranks 1 st among top 7 baselines with > 2× improvement over single-agent IRL, and is competitive with state-of-the-art large transformer-based encoder-decoder models on sparser datasets such as ETH/UCY (ranks 3 rd among top 7 baselines).

RLJ Journal 2025 Journal Article

Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models

  • Kefan Song
  • Jin Yao
  • Runnan Jiang
  • Rohan Chandra
  • Shangtong Zhang

As Large Language Models (LLMs) become increasingly powerful and accessible to human users, ensuring fairness across diverse demographic groups, i.e., group fairness, is a critical ethical concern. However, current fairness and bias research in LLMs is limited in two aspects. First, compared to traditional group fairness in machine learning classification, it requires that the non-sensitive attributes, in this case, the question in the user prompts, be the same across different groups. In many practical scenarios, different groups, however, may prefer different questions and this requirement becomes impractical. Second, it evaluates group fairness only for the LLM's final output without identifying the source of possible bias. Namely, the bias in LLM's output can result from both the pretraining and the finetuning. For finetuning, the bias can result from both the RLHF procedure and the learned reward model. Arguably, evaluating the group fairness of each component in the LLM pipeline could help develop better methods to mitigate the possible bias. Recognizing those two limitations, this work benchmarks the group fairness of learned reward models. By using expert-written text from arXiv, we are able to benchmark the group fairness of reward models without requiring the same question in the user prompts across different demographic groups. Surprisingly, our results demonstrate that all the evaluated reward models (e.g., Nemotron-4-340B-Reward, ArmoRM-Llama3-8B-v0.1, and GRM-llama3-8B-sftreg) exhibit statistically significant group unfairness. We also observed that top-performing reward models (w.r.t. canonical performance metrics) tend to demonstrate better group fairness.

RLC Conference 2025 Conference Paper

Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models

  • Kefan Song
  • Jin Yao
  • Runnan Jiang
  • Rohan Chandra
  • Shangtong Zhang

As Large Language Models (LLMs) become increasingly powerful and accessible to human users, ensuring fairness across diverse demographic groups, i. e. , group fairness, is a critical ethical concern. However, current fairness and bias research in LLMs is limited in two aspects. First, compared to traditional group fairness in machine learning classification, it requires that the non-sensitive attributes, in this case, the question in the user prompts, be the same across different groups. In many practical scenarios, different groups, however, may prefer different questions and this requirement becomes impractical. Second, it evaluates group fairness only for the LLM's final output without identifying the source of possible bias. Namely, the bias in LLM's output can result from both the pretraining and the finetuning. For finetuning, the bias can result from both the RLHF procedure and the learned reward model. Arguably, evaluating the group fairness of each component in the LLM pipeline could help develop better methods to mitigate the possible bias. Recognizing those two limitations, this work benchmarks the group fairness of learned reward models. By using expert-written text from arXiv, we are able to benchmark the group fairness of reward models without requiring the same question in the user prompts across different demographic groups. Surprisingly, our results demonstrate that all the evaluated reward models (e. g. , Nemotron-4-340B-Reward, ArmoRM-Llama3-8B-v0. 1, and GRM-llama3-8B-sftreg) exhibit statistically significant group unfairness. We also observed that top-performing reward models (w. r. t. canonical performance metrics) tend to demonstrate better group fairness.

NeurIPS Conference 2025 Conference Paper

Towards Provable Emergence of In-Context Reinforcement Learning

  • Jiuqi Wang
  • Rohan Chandra
  • Shangtong Zhang

Typically, a modern reinforcement learning (RL) agent solves a task by updating its neural network parameters to adapt its policy to the task. Recently, it has been observed that some RL agents can solve a wide range of new out-of-distribution tasks without parameter updates after pretraining on some task distribution. When evaluated in a new task, instead of making parameter updates, the pretrained agent conditions its policy on additional input called the context, e. g. , the agent's interaction history in the new task. The agent's performance increases as the information in the context increases, with the agent's parameters fixed. This phenomenon is typically called in-context RL (ICRL). The pretrained parameters of the agent network enable the remarkable ICRL phenomenon. However, many ICRL works perform the pretraining with standard RL algorithms. This raises the central question this paper aims to address: Why can the RL pretraining algorithm generate network parameters that enable ICRL? We hypothesize that the parameters capable of ICRL are minimizers of the pretraining loss. This work provides initial support for this hypothesis through a case study. In particular, we prove that when a Transformer is pretrained for policy evaluation, one of the global minimizers of the pretraining loss can enable in-context temporal difference learning.

ICRA Conference 2024 Conference Paper

Rethinking Social Robot Navigation: Leveraging the Best of Two Worlds

  • Amir Hossain Raj
  • Zichao Hu
  • Haresh Karnan
  • Rohan Chandra
  • Amirreza Payandeh
  • Luisa Mao
  • Peter Stone 0001
  • Joydeep Biswas

Empowering robots to navigate in a socially compliant manner is essential for the acceptance of robots moving in human-inhabited environments. Previously, roboticists have developed geometric navigation systems with decades of empirical validation to achieve safety and efficiency. However, the many complex factors of social compliance make geometric navigation systems hard to adapt to social situations, where no amount of tuning enables them to be both safe (people are too unpredictable) and efficient (the frozen robot problem). With recent advances in deep learning approaches, the common reaction has been to entirely discard these classical navigation systems and start from scratch, building a completely new learning-based social navigation planner. In this work, we find that this reaction is unnecessarily extreme: using a large-scale real-world social navigation dataset, SCAND, we find that geometric systems can produce trajectory plans that align with the human demonstrations in a large number of social situations. We, therefore, ask if we can rethink the social robot navigation problem by leveraging the advantages of both geometric and learning-based methods. We validate this hybrid paradigm through a proof-of-concept experiment, in which we develop a hybrid planner that switches between geometric and learning-based planning. Our experiments on both SCAND and two physical robots show that the hybrid planner can achieve better social compliance compared to using either the geometric or learning-based approach alone.

AAAI Conference 2024 System Paper

SOCIALGYM 2.0: Simulator for Multi-Robot Learning and Navigation in Shared Human Spaces

  • Rohan Chandra
  • Zayne Sprague
  • Joydeep Biswas

We present Social Gym 2.0, a simulator for multi-agent navigation research. Our simulator enables navigation for multiple autonomous agents, replicating real-world dynamics in complex indoor environments, including doorways, hallways, intersections, and roundabouts. Unlike current simulators that concentrate on single robots in open spaces, Social Gym 2.0 employs multi-agent reinforcement learning (MARL) to develop optimal navigation policies for multiple robots with diverse, dynamic constraints in complex environments. Social Gym 2.0 also departs from the accepted software design standards by employing a configuration-over-convention paradigm providing the capability to benchmark different MARL algorithms, as well as customize observation and reward functions. Users can additionally create their own environments and evaluate various algorithms, based on both deep reinforcement learning as well as classical navigation, using a broad range of social navigation metrics.

ICRA Conference 2023 Conference Paper

METEOR: A Dense, Heterogeneous, and Unstructured Traffic Dataset with Rare Behaviors

  • Rohan Chandra
  • Xijun Wang 0002
  • Mridul Mahajan
  • Rahul Kala
  • Rishitha Palugulla
  • Chandrababu Naidu
  • Alok Jain
  • Dinesh Manocha

We present a new traffic dataset, Meteor, which captures traffic patterns and multi-agent driving behaviors in unstructured scenarios. Meteor consists of more than 1000 one-minute videos, over 2 million annotated frames with bounding boxes and GPS trajectories for 16 unique agent categories, and more than 13 million bounding boxes for traffic agents. Meteor is a dataset for rare and interesting, multi-agent driving behaviors that are grouped into traffic violations, atypical interactions, and diverse scenarios. Every video in Meteor is tagged using a diverse range of factors corresponding to weather, time of the day, road conditions, and traffic density. We use Meteor to benchmark perception methods for object detection and multi-agent behavior prediction. Our key finding is that state-of-the-art models for object detection and behavior prediction, which otherwise succeed on existing datasets such as Waymo, fail on the Meteor dataset. Meteor is a step towards developing more sophisticated perception models for dense, heterogeneous, and unstructured scenarios.

ICRA Conference 2022 Conference Paper

Game-Theoretic Planning for Autonomous Driving among Risk-Aware Human Drivers

  • Rohan Chandra
  • Mingyu Wang 0002
  • Mac Schwager
  • Dinesh Manocha

We present a novel approach for risk-aware planning with human agents in multi-agent traffic scenarios. Our approach takes into account the wide range of human driver behaviors on the road, from aggressive maneuvers like speeding and overtaking, to conservative traits like driving slowly and conforming to the right-most lane. In our approach, we learn a mapping from a data-driven human driver behavior model called the CMetric to a driver's entropic risk preference. We then use the derived risk preference within a game-theoretic risk-sensitive planner to model risk-aware interactions among human drivers and an autonomous vehicle in various traffic scenarios. We demonstrate our method in a merging scenario, where our results show that the final trajectories obtained from the risk-aware planner generate desirable emergent behaviors. Particularly, our planner recognizes aggressive human drivers and yields to them while maintaining a greater distance from them. In a user study, participants were able to distinguish between aggressive and conservative simulated drivers based on trajectories generated from our risk-sensitive planner. We also observe that aggressive human driving results in more frequent lane-changing in the planner. Finally, we compare the performance of our modified risk-aware planner with existing methods and show that modeling human driver behavior leads to safer navigation.

IROS Conference 2020 Conference Paper

CMetric: A Driving Behavior Measure using Centrality Functions

  • Rohan Chandra
  • Uttaran Bhattacharya
  • Trisha Mittal
  • Aniket Bera
  • Dinesh Manocha

We present a new measure, CMetric, to classify driver behaviors using centrality functions. Our formulation combines concepts from computational graph theory and social traffic psychology to quantify and classify the behavior of human drivers. CMetric is used to compute the probability of a vehicle executing a driving style, as well as the intensity used to execute the style. Our approach is designed for realtime autonomous driving applications, where the trajectory of each vehicle or road-agent is extracted from a video. We compute a dynamic geometric graph (DGG) based on the positions and proximity of the road-agents and centrality functions corresponding to closeness and degree. These functions are used to compute the CMetric based on style likelihood and style intensity estimates. Our approach is general and makes no assumption about traffic density, heterogeneity, or how driving behaviors change over time. We present an algorithm to compute CMetric and demonstrate its performance on real-world traffic datasets. To test the accuracy of CMetric, we introduce a new evaluation protocol (called "Time Deviation Error") that measures the difference between human prediction and the prediction made by CMetric.

ICRA Conference 2020 Conference Paper

DenseCAvoid: Real-time Navigation in Dense Crowds using Anticipatory Behaviors

  • Adarsh Jagan Sathyamoorthy
  • Jing Liang 0006
  • Utsav Patel
  • Tianrui Guan
  • Rohan Chandra
  • Dinesh Manocha

We present DenseCAvoid, a novel algorithm for navigating a robot through dense crowds and avoiding collisions by anticipating pedestrian behaviors. Our formulation uses visual sensors and a pedestrian trajectory prediction algorithm to track pedestrians in a set of input frames and compute bounding boxes that extrapolate to the pedestrian positions in a future time. Our hybrid approach combines this trajectory prediction with a Deep Reinforcement Learning-based collision avoidance method to train a policy to generate smoother, safer, and more robust trajectories during run-time. We train our policy in realistic 3-D simulations of static and dynamic scenarios with multiple pedestrians. In practice, our hybrid approach generalizes well to unseen, real-world scenarios and can navigate a robot through dense crowds (~1-2 humans per square meter) in indoor scenarios, including narrow corridors and lobbies. As compared to cases where prediction was not used, we observe that our method reduces the occurrence of the robot freezing in a crowd by up to 48%, and performs comparably with respect to trajectory lengths and mean arrival times to goal.

ICRA Conference 2020 Conference Paper

GraphRQI: Classifying Driver Behaviors Using Graph Spectrums

  • Rohan Chandra
  • Uttaran Bhattacharya
  • Trisha Mittal
  • Xiaoyu Li
  • Aniket Bera
  • Dinesh Manocha

We present a novel algorithm (GraphRQI) to identify driver behaviors from road-agent trajectories. Our approach assumes that the road-agents exhibit a range of driving traits, such as aggressive or conservative driving. Moreover, these traits affect the trajectories of nearby road-agents as well as the interactions between road-agents. We represent these inter-agent interactions using unweighted and undirected traffic graphs. Our algorithm classifies the driver behavior using a supervised learning algorithm by reducing the computation to the spectral analysis of the traffic graph. Moreover, we present a novel eigenvalue algorithm to compute the spectrum efficiently. We provide theoretical guarantees for the running time complexity of our eigenvalue algorithm and show that it is faster than previous methods by 2 times. We evaluate the classification accuracy of our approach on traffic videos and autonomous driving datasets corresponding to urban traffic. In practice, GraphRQI achieves an accuracy improvement of up to 25% over prior driver behavior classification algorithms. We also use our classification algorithm to predict the future trajectories of road-agents.

AAAI Conference 2020 Conference Paper

M3ER: Multiplicative Multimodal Emotion Recognition using Facial, Textual, and Speech Cues

  • Trisha Mittal
  • Uttaran Bhattacharya
  • Rohan Chandra
  • Aniket Bera
  • Dinesh Manocha

We present M3ER, a learning-based method for emotion recognition from multiple input modalities. Our approach combines cues from multiple co-occurring modalities (such as face, text, and speech) and also is more robust than other methods to sensor noise in any of the individual modalities. M3ER models a novel, data-driven multiplicative fusion method to combine the modalities, which learn to emphasize the more reliable cues and suppress others on a persample basis. By introducing a check step which uses Canonical Correlational Analysis to differentiate between ineffective and effective modalities, M3ER is robust to sensor noise. M3ER also generates proxy features in place of the ineffectual modalities. We demonstrate the efficiency of our network through experimentation on two benchmark datasets, IEMOCAP and CMU-MOSEI. We report a mean accuracy of 82. 7% on IEMOCAP and 89. 0% on CMU-MOSEI, which, collectively, is an improvement of about 5% over prior work.

ICRA Conference 2020 Conference Paper

RoadTrack: Realtime Tracking of Road Agents in Dense and Heterogeneous Environments

  • Rohan Chandra
  • Uttaran Bhattacharya
  • Tanmay Randhavane
  • Aniket Bera
  • Dinesh Manocha

We present a realtime tracking algorithm, Road-Track, to track heterogeneous road-agents in dense traffic videos. Our approach is designed for dense traffic scenarios that consist of different road-agents such as pedestrians, two-wheelers, cars, buses, etc. sharing the road. We use the tracking-by-detection approach where we track a road-agent by matching the appearance or bounding box region in the current frame with the predicted bounding box region propagated from the previous frame. Roadtrack uses a novel motion model called the Simultaneous Collision Avoidance and Interaction (SimCAI) model to predict the motion of road-agents by modeling collision avoidance and interactions between the road-agents for the next frame. We demonstrate the advantage of RoadTrack on a dataset of dense traffic videos and observe an accuracy of 75. 8% on this dataset, outperforming prior state-of-the-art tracking algorithms by at least 5. 2%. RoadTrack operates in realtime at approximately 30 fps and is at least 4× faster than prior tracking algorithms on standard tracking datasets.

AAAI Conference 2020 Conference Paper

STEP: Spatial Temporal Graph Convolutional Networks for Emotion Perception from Gaits

  • Uttaran Bhattacharya
  • Trisha Mittal
  • Rohan Chandra
  • Tanmay Randhavane
  • Aniket Bera
  • Dinesh Manocha

We present a novel classifier network called STEP, to classify perceived human emotion from gaits, based on a Spatial Temporal Graph Convolutional Network (ST-GCN) architecture. Given an RGB video of an individual walking, our formulation implicitly exploits the gait features to classify the perceived emotion of the human into one of four emotions: happy, sad, angry, or neutral. We train STEP on annotated real-world gait videos, augmented with annotated synthetic gaits generated using a novel generative network called STEP-Gen, built on an ST-GCN based Conditional Variational Autoencoder (CVAE). We incorporate a novel pushpull regularization loss in the CVAE formulation of STEP- Gen to generate realistic gaits and improve the classification accuracy of STEP. We also release a novel dataset (E-Gait), which consists of 4, 227 human gaits annotated with perceived emotions along with thousands of synthetic gaits. In practice, STEP can learn the affective features and exhibits classification accuracy of 88% on E-Gait, which is 14–30% more accurate over prior methods.

IROS Conference 2019 Conference Paper

DensePeds: Pedestrian Tracking in Dense Crowds Using Front-RVO and Sparse Features

  • Rohan Chandra
  • Uttaran Bhattacharya
  • Aniket Bera
  • Dinesh Manocha

We present a pedestrian tracking algorithm, DensePeds, that tracks individuals in highly dense crowds (>2 pedestrians per square meter). Our approach is designed for videos captured from front-facing or elevated cameras. We present a new motion model called Front-RVO (FRVO) for predicting pedestrian movements in dense situations using collision avoidance constraints and combine it with state-of-the-art Mask R-CNN to compute sparse feature vectors that reduce the loss of pedestrian tracks (false negatives). We evaluate DensePeds on the standard MOT benchmarks as well as a new dense crowd dataset. In practice, our approach is 4. 5 × faster than prior tracking algorithms on the MOT benchmark and we are state-of-the-art in dense crowd videos by over 2. 6% on the absolute scale on average.