Arrow Research search

Author name cluster

Sandeep Chinchali

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers
2 author rows

Possible papers

26

AAAI Conference 2026 Conference Paper

NeuS-QA: Grounding Long-Form Video Understanding in Temporal Logic and Neuro-Symbolic Reasoning

  • Sahil Shah
  • S P Sharan
  • Harsh Goel
  • Minkyu Choi
  • Mustafa Munir
  • Manvik Pasula
  • Radu Marculescu
  • Sandeep Chinchali

While vision-language models (VLMs) excel at tasks involving single images or short videos, they still struggle with Long Video Question Answering (LVQA) due to its demand for complex multi-step temporal reasoning. Vanilla approaches, which simply sample frames uniformly and feed them to a VLM along with the question, incur significant token overhead. This forces aggressive downsampling of long videos, causing models to miss fine-grained visual structure, subtle event transitions, and key temporal cues. Recent works attempt to overcome these limitations through heuristic approaches; however, they lack explicit mechanisms for encoding temporal relationships and fail to provide any formal guarantees that the sampled context actually encodes the compositional or causal logic required by the question. To address these foundational gaps, we introduce NeuS-QA, a training-free, plug-and-play neuro-symbolic pipeline for LVQA. NeuS-QA first translates a natural language question into a logic specification that models the temporal relationship between frame-level events. Next, we construct a video automaton to model the video's frame-by-frame event progression, and finally employ model checking to compare the automaton against the specification to identify all video segments that satisfy the question's logical requirements. Only these logic-verified segments are submitted to the VLM, thus improving interpretability, reducing hallucinations, and enabling compositional reasoning without modifying or fine-tuning the model. Experiments on the LongVideoBench and CinePile benchmarks show that NeuS-QA significantly improves performance by over 10%, particularly on questions involving event ordering, causality, and multi-step reasoning.

NeurIPS Conference 2025 Conference Paper

Constrained Posterior Sampling: Time Series Generation with Hard Constraints

  • Sai Shankar Narasimhan
  • Shubhankar Agarwal
  • Litu Rout
  • Sanjay Shakkottai
  • Sandeep Chinchali

Generating realistic time series samples is crucial for stress-testing models and protecting user privacy by using synthetic data. In engineering and safety-critical applications, these samples must meet certain hard constraints that are domain-specific or naturally imposed by physics or nature. Consider, for example, generating electricity demand patterns with constraints on peak demand times. This can be used to stress-test the functioning of power grids during adverse weather conditions. Existing approaches for generating constrained time series are either not scalable or degrade sample quality. To address these challenges, we introduce Constrained Posterior Sampling (CPS), a diffusion-based sampling algorithm that aims to project the posterior mean estimate into the constraint set after each denoising update. Notably, CPS scales to a large number of constraints ($\sim100$) without requiring additional training. We provide theoretical justifications highlighting the impact of our projection step on sampling. Empirically, CPS outperforms state-of-the-art methods in sample quality and similarity to real time series by around 70\% and 22\%, respectively, on real-world stocks, traffic, and air quality datasets.

ICLR Conference 2025 Conference Paper

CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features

  • Po-han Li
  • Sandeep Chinchali
  • Ufuk Topcu

Multimodal encoders like CLIP excel in tasks such as zero-shot image classification and cross-modal retrieval. However, they require excessive training data. We propose canonical similarity analysis (CSA), which uses two unimodal encoders to replicate multimodal encoders using limited data. CSA maps unimodal features into a multimodal space, using a new similarity score to retain only the multimodal information. CSA only involves the inference of unimodal encoders and a cubic-complexity matrix decomposition, eliminating the need for extensive GPU-based model training. Experiments show that CSA outperforms CLIP while requiring $50$,$000\times$ fewer multimodal data pairs to bridge the modalities given pre-trained unimodal encoders on ImageNet classification and misinformative news caption detection. CSA surpasses the state-of-the-art method to map unimodal features to multimodal features. We also demonstrate the ability of CSA with modalities beyond image and text, paving the way for future modality pairs with limited paired multimodal data but abundant unpaired unimodal data, such as LiDAR and text.

ICLR Conference 2025 Conference Paper

Exploiting Distribution Constraints for Scalable and Efficient Image Retrieval

  • Mohammad Omama
  • Po-han Li
  • Sandeep Chinchali

Image retrieval is crucial in robotics and computer vision, with downstream applications in robot place recognition and vision-based product recommendations. Modern retrieval systems face two key challenges: scalability and efficiency. State-of-the-art image retrieval systems train specific neural networks for each dataset, an approach that lacks scalability. Furthermore, since retrieval speed is directly proportional to embedding size, existing systems that use large embeddings lack efficiency. To tackle scalability, recent works propose using off-the-shelf foundation models. However, these models, though applicable across datasets, fall short in achieving performance comparable to that of dataset-specific models. Our key observation is that, while foundation models capture necessary subtleties for effective retrieval, the underlying distribution of their embedding space can negatively impact cosine similarity searches. We introduce Autoencoders with Strong Variance Constraints (AE-SVC), which, when used for projection, significantly improves the performance of foundation models. We provide an in-depth theoretical analysis of AE-SVC. Addressing efficiency, we introduce Single-Shot Similarity Space Distillation ((SS)2D), a novel approach to learn embeddings with adaptive sizes that offers a better trade-off between size and performance. We conducted extensive experiments on four retrieval datasets, including Stan- ford Online Products (SoP) and Pittsburgh30k, using four different off-the-shelf foundation models, including DinoV2 and CLIP. AE-SVC demonstrates up to a 16% improvement in retrieval performance, while (SS)2D shows a further 10% improvement for smaller embedding sizes.

AAMAS Conference 2025 Conference Paper

Human-Agent Coordination in Games under Incomplete Information via Multi-Step Intent

  • Shenghui Chen
  • Ruihan Zhao
  • Sandeep Chinchali
  • Ufuk Topcu

Strategic coordination between autonomous agents and human partners under incomplete information can be modeled as turnbased cooperative games. We extend a turn-based game under incomplete information, the shared-control game, to allow players to take multiple actions per turn rather than a single action. The extension enables the use of multi-step intent, which we hypothesize will improve performance in long-horizon tasks. To synthesize cooperative policies for the agent in this extended game, we propose an approach featuring a memory module for a running probabilistic belief of the environment dynamics and an online planning algorithm called IntentMCTS. This algorithm strategically selects the next action by leveraging any communicated multi-step intent via reward augmentation while considering the current belief. Agentto-agent simulations in the Gnomes at Night testbed demonstrate that IntentMCTS requires fewer steps and control switches than baseline methods. A human-agent user study corroborates these findings, showing an 18. 52% higher success rate compared to the heuristic baseline and a 5. 56% improvement over the single-step prior work. Participants also report lower cognitive load, frustration, and higher satisfaction with the IntentMCTS agent partner. 1

ICML Conference 2025 Conference Paper

R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning

  • Harsh Goel
  • Mohammad Omama
  • Behdad Chalaki
  • Vaishnav Tadiparthi
  • Ehsan Moradi-Pari
  • Sandeep Chinchali

Multi-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent’s past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent’s role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents’ roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https: //github. com/UTAustin-SwarmLab/R3DM.

EAAI Journal 2025 Journal Article

Synthetic data-augmented explainable Vision Transformer for colorectal cancer diagnosis via surface tactile imaging

  • Siddhartha Kapuria
  • Naruhiko Ikoma
  • Sandeep Chinchali
  • Farshid Alambeigi

In this work, we present a synthetic data augmented explainable Vision Transformer (ViT) framework designed for the informed and intuitive early diagnosis of colorectal cancer (CRC) polyps. The framework uses textural images — generated by our recently developed vision-based tactile sensor (called HySenSe) and augmented by synthetically generated images from a diffusion model pipeline, to output class-based probabilities of potential CRC polyp types. Additionally, it provides local relevancy-based heatmaps to assist clinicians by highlighting key areas of interest in the tactile images representing CRC polyp textures. We benchmark each aspect of this framework through: (i) Inception Scores for the synthetic images generated by the diffusion pipeline, (ii) Performance evaluation and sensitivity analyses on the effects of synthetic data addition on model generalizability compared with other state-of-the-art architectures, (iii) Dimensionality reduction techniques to confirm the suitability of synthetically generated images, and (iv) Comparison of two independent approaches visualizing explainability.

NeurIPS Conference 2025 Conference Paper

VIBE: Annotation-Free Video-to-Text Information Bottleneck Evaluation for TL;DR

  • Shenghui Chen
  • Po-han Li
  • Sandeep Chinchali
  • Ufuk Topcu

Many decision-making tasks, where both accuracy and efficiency matter, still require human supervision. For example, tasks like traffic officers reviewing hour-long dashcam footage or researchers screening conference videos can benefit from concise summaries that reduce cognitive load and save time. Yet current vision-language models (VLMs) often produce verbose, redundant outputs that hinder task performance. Existing video caption evaluation depends on costly human annotations and overlooks the summaries' utility in downstream tasks. We address these gaps with $\underline{\textbf{V}}$ideo-to-text $\underline{\textbf{I}}$nformation $\underline{\textbf{B}}$ottleneck $\underline{\textbf{E}}$valuation (VIBE), an annotation-free method that scores VLM outputs using two metrics: $\textit{grounding}$ (how well the summary aligns with visual content) and $\textit{utility}$ (how informative it is for the task). VIBE selects from randomly sampled VLM outputs by ranking them according to the two scores to support effective human decision-making. Human studies on $\texttt{LearningPaper24}$, $\texttt{SUTD-TrafficQA}$, and $\texttt{LongVideoBench}$ show that summaries selected by VIBE consistently improve performance—boosting task accuracy by up to $61. 23$% and reducing response time by $75. 77$% compared to naive VLM summaries or raw video.

AIJ Journal 2024 Journal Article

Joint learning of reward machines and policies in environments with partially known semantics

  • Christos K. Verginis
  • Cevahir Koprulu
  • Sandeep Chinchali
  • Ufuk Topcu

We study the problem of reinforcement learning for a task encoded by a reward machine. The task is defined over a set of properties in the environment, called atomic propositions, and represented by Boolean variables. One unrealistic assumption commonly used in the literature is that the truth values of these propositions are accurately known. In real situations, however, these truth values are uncertain since they come from sensors that suffer from imperfections. At the same time, reward machines can be difficult to model explicitly, especially when they encode complicated tasks. We develop a reinforcement-learning algorithm that infers a reward machine that encodes the underlying task while learning how to execute it, despite the uncertainties of the propositions' truth values. In order to address such uncertainties, the algorithm maintains a probabilistic estimate about the truth value of the atomic propositions; it updates this estimate according to new sensory measurements that arrive from exploration of the environment. Additionally, the algorithm maintains a hypothesis reward machine, which acts as an estimate of the reward machine that encodes the task to be learned. As the agent explores the environment, the algorithm updates the hypothesis reward machine according to the obtained rewards and the estimate of the atomic propositions' truth value. Finally, the algorithm uses a Q-learning procedure for the states of the hypothesis reward machine to determine an optimal policy that accomplishes the task. We prove that the algorithm successfully infers the reward machine and asymptotically learns a policy that accomplishes the respective task.

IROS Conference 2024 Conference Paper

PEERNet: An End-to-End Profiling Tool for Real-Time Networked Robotic Systems

  • Aditya Narayanan
  • Pranav Kasibhatla
  • Minkyu Choi 0001
  • Po-han Li
  • Ruihan Zhao 0001
  • Sandeep Chinchali

Networked robotic systems balance compute, power, and latency constraints in applications such as self-driving vehicles, drone swarms, and teleoperated surgery. A core problem in this domain is deciding when to offload a computationally expensive task to the cloud, a remote server, at the cost of communication latency. Task offloading algorithms often rely on precise knowledge of system-specific performance metrics, such as sensor data rates, network bandwidth, and machine learning model latency. While these metrics can be modeled during system design, uncertainties in connection quality, server load, and hardware conditions introduce real-time performance variations, hindering overall performance. We introduce PEERNet, an end-to-end and real-time profiling tool for cloud robotics. PEERNet enables performance monitoring on heterogeneous hardware through targeted yet adaptive profiling of system components such as sensors, networks, deep-learning pipelines, and devices. We showcase PEERNet’s capabilities through networked robotics tasks, such as image-based teleoperation of a Franka Emika Panda arm and querying vision language models using an Nvidia Jetson Orin. PEERNet reveals non-intuitive behavior in robotic systems, such as asymmetric network transmission and bimodal language model output. Our evaluation underscores the effectiveness and importance of benchmarking in networked robotics, demonstrating PEERNet’s adaptability. Our code is open-source and available at github.com/UTAustin-SwarmLab/PEERNet.

ECAI Conference 2024 Conference Paper

Reduce, Reuse, Recycle: Categories for Compositional Reinforcement Learning

  • Georgios Bakirtzis
  • Michail Savvas
  • Ruihan Zhao 0001
  • Sandeep Chinchali
  • Ufuk Topcu

In reinforcement learning, conducting task composition by forming cohesive, executable sequences from multiple tasks remains challenging. However, the ability to (de)compose tasks is a linchpin in developing robotic systems capable of learning complex behaviors. Yet, compositional reinforcement learning is beset with difficulties, including the high dimensionality of the problem space, scarcity of rewards, and absence of system robustness after task composition. To surmount these challenges, we view task composition through the prism of category theory—a mathematical discipline exploring structures and their compositional relationships. The categorical properties of Markov decision processes untangle complex tasks into manageable sub-tasks, allowing for strategical reduction of dimensionality, facilitating more tractable reward structures, and bolstering system robustness. Experimental results support the categorical theory of reinforcement learning by enabling skill reduction, reuse, and recycling when learning complex robotic arm tasks.

IROS Conference 2024 Conference Paper

Robot-Enabled Machine Learning-Based Diagnosis of Gastric Cancer Polyps Using Partial Surface Tactile Imaging

  • Siddhartha Kapuria
  • Jeff Bonyun
  • Yash Kulkarni
  • Naruhiko Ikoma
  • Sandeep Chinchali
  • Farshid Alambeigi

In this paper, to collectively address the existing limitations on endoscopic diagnosis of Advanced Gastric Cancer (AGC) Tumors, for the first time, we propose (i) utilization and evaluation of our recently developed Vision-based Tactile Sensor (VTS), and (ii) a complementary Machine Learning (ML) algorithm for classifying tumors using their textural features. Leveraging a seven DoF robotic manipulator and unique custom-designed and additively-manufactured realistic AGC tumor phantoms, we demonstrated the advantages of automated data collection using the VTS addressing the problem of data scarcity and biases encountered in traditional ML-based approaches. Our synthetic-data-trained ML model was successfully evaluated and compared with traditional ML models utilizing various statistical metrics even under mixed morphological characteristics and partial sensor contact.

ICML Conference 2024 Conference Paper

Time Weaver: A Conditional Time Series Generation Model

  • Sai Shankar Narasimhan
  • Shubhankar Agarwal
  • Oguzhan Akcin
  • Sujay Sanghavi
  • Sandeep Chinchali

Imagine generating a city’s electricity demand pattern based on weather, the presence of an electric vehicle, and location, which could be used for capacity planning during a winter freeze. Such real-world time series are often enriched with paired heterogeneous contextual metadata (e. g. , weather and location). Current approaches to time series generation often ignore this paired metadata. Additionally, the heterogeneity in metadata poses several practical challenges in adapting existing conditional generation approaches from the image, audio, and video domains to the time series domain. To address this gap, we introduce TIME WEAVER, a novel diffusion-based model that leverages the heterogeneous metadata in the form of categorical, continuous, and even time-variant variables to significantly improve time series generation. Additionally, we show that naive extensions of standard evaluation metrics from the image to the time series domain are insufficient. These metrics do not penalize conditional generation approaches for their poor specificity in reproducing the metadata-specific features in the generated time series. Thus, we innovate a novel evaluation metric that accurately captures the specificity of conditional generation and the realism of the generated time series. We show that TIME WEAVER outperforms state-of-the-art benchmarks, such as Generative Adversarial Networks (GANs), by up to 30% in downstream classification tasks on real-world energy, medical, air quality, and traffic datasets.

ICRA Conference 2023 Conference Paper

Robust Forecasting for Robotic Control: A Game-Theoretic Approach

  • Shubhankar Agarwal
  • David Fridovich-Keil
  • Sandeep Chinchali

Modern robots require accurate forecasts to make optimal decisions in the real world. For example, self-driving cars need an accurate forecast of other agents' future actions to plan safe trajectories. Current methods rely heavily on historical time series to accurately predict the future. However, relying entirely on the observed history is problematic since it could be corrupted by noise, have outliers, or not completely represent all possible outcomes. To solve this problem, we propose a novel framework for generating robust forecasts for robotic control. In order to model real-world factors affecting future forecasts, we introduce the notion of an adversary, which perturbs observed historical time series to increase a robot's ultimate control cost. Specifically, we model this interaction as a zero-sum two-player game between a robot's forecaster and this hypothetical adversary. We show that our proposed game may be solved to a local Nash equilibrium using gradient-based optimization techniques. Furthermore, we show that a forecaster trained with our method performs 30. 14% better on out-of-distribution real-world lane change data than baselines.

NeurIPS Conference 2023 Conference Paper

Task-aware Distributed Source Coding under Dynamic Bandwidth

  • Po-han Li
  • Sravan Kumar Ankireddy
  • Ruihan (Philip) Zhao
  • Hossein Nourkhiz Mahjoub
  • Ehsan Moradi Pari
  • Ufuk Topcu
  • Sandeep Chinchali
  • Hyeji Kim

Efficient compression of correlated data is essential to minimize communication overload in multi-sensor networks. In such networks, each sensor independently compresses the data and transmits them to a central node. A decoder at the central node decompresses and passes the data to a pre-trained machine learning-based task model to generate the final output. Due to limited communication bandwidth, it is important for the compressor to learn only the features that are relevant to the task. Additionally, the final performance depends heavily on the total available bandwidth. In practice, it is common to encounter varying availability in bandwidth. Since higher bandwidth results in better performance, it is essential for the compressor to dynamically take advantage of the maximum available bandwidth at any instant. In this work, we propose a novel distributed compression framework composed of independent encoders and a joint decoder, which we call neural distributed principal component analysis (NDPCA). NDPCA flexibly compresses data from multiple sources to any available bandwidth with a single model, reducing compute and storage overhead. NDPCA achieves this by learning low-rank task representations and efficiently distributing bandwidth among sensors, thus providing a graceful trade-off between performance and bandwidth. Experiments show that NDPCA improves the success rate of multi-view robotic arm manipulation by 9% and the accuracy of object detection tasks on satellite imagery by 14% compared to an autoencoder with uniform bandwidth allocation.

NeurIPS Conference 2022 Conference Paper

Class-Aware Adversarial Transformers for Medical Image Segmentation

  • Chenyu You
  • Ruihan Zhao
  • Fenglin Liu
  • Siyuan Dong
  • Sandeep Chinchali
  • Ufuk Topcu
  • Lawrence Staib
  • James Duncan

Transformers have made remarkable progress towards modeling long-range dependencies within the medical image analysis domain. However, current transformer-based models suffer from several disadvantages: (1) existing methods fail to capture the important features of the images due to the naive tokenization scheme; (2) the models suffer from information loss because they only consider single-scale feature representations; and (3) the segmentation label maps generated by the models are not accurate enough without considering rich semantic contexts and anatomical textures. In this work, we present CASTformer, a novel type of adversarial transformers, for 2D medical image segmentation. First, we take advantage of the pyramid structure to construct multi-scale representations and handle multi-scale variations. We then design a novel class-aware transformer module to better learn the discriminative regions of objects with semantic structures. Lastly, we utilize an adversarial training strategy that boosts segmentation accuracy and correspondingly allows a transformer-based discriminator to capture high-level semantically correlated contents and low-level anatomical features. Our experiments demonstrate that CASTformer dramatically outperforms previous state-of-the-art transformer-based approaches on three benchmarks, obtaining 2. 54%-5. 88% absolute improvements in Dice over previous models. Further qualitative experiments provide a more detailed picture of the model’s inner workings, shed light on the challenges in improved transparency, and demonstrate that transfer learning can greatly improve performance and reduce the size of medical image datasets in training, making CASTformer a strong starting point for downstream medical image analysis tasks.

IROS Conference 2022 Conference Paper

Drift Reduced Navigation with Deep Explainable Features

  • Mohammad Omama
  • Sundar Sripada V. S.
  • Sandeep Chinchali
  • Arun Kumar Singh 0001
  • K. Madhava Krishna

Modern autonomous vehicles (AVs) often rely on vision, LIDAR, and even radar-based simultaneous localization and mapping (SLAM) frameworks for precise localization and navigation. However, modern SLAM frameworks often lead to unacceptably high levels of drift (i. e. , localization error) when AVs observe few visually distinct features or encounter occlusions due to dynamic obstacles. This paper argues that minimizing drift must be a key desiderata in AV motion planning, which requires an AV to take active control decisions to move towards feature-rich regions while also minimizing conventional control cost. To do so, we first introduce a novel data-driven perception module that observes LIDAR point clouds and estimates which features/regions an AV must navigate towards for drift minimization. Then, we introduce an interpretable model predictive controller (MPC) that moves an AV toward such feature-rich regions while avoiding visual occlusions and gracefully trading off drift and control cost. Our experiments on challenging, dynamic scenarios in the state-of-the-art CARLA simulator indicate our method reduces drift up to 76. 76% compared to benchmark approaches.

ICML Conference 2022 Conference Paper

Task-aware Privacy Preservation for Multi-dimensional Data

  • Jiangnan Cheng
  • Ao Tang
  • Sandeep Chinchali

Local differential privacy (LDP) can be adopted to anonymize richer user data attributes that will be input to sophisticated machine learning (ML) tasks. However, today’s LDP approaches are largely task-agnostic and often lead to severe performance loss – they simply inject noise to all data attributes according to a given privacy budget, regardless of what features are most relevant for the ultimate task. In this paper, we address how to significantly improve the ultimate task performance with multi-dimensional user data by considering a task-aware privacy preservation problem. The key idea is to use an encoder-decoder framework to learn (and anonymize) a task-relevant latent representation of user data. We obtain an analytical near-optimal solution for the linear setting with mean-squared error (MSE) task loss. We also provide an approximate solution through a gradient-based learning algorithm for general nonlinear cases. Extensive experiments demonstrate that our task-aware approach significantly improves ultimate task accuracy compared to standard benchmark LDP approaches with the same level of privacy guarantee.

NeurIPS Conference 2021 Conference Paper

Data Sharing and Compression for Cooperative Networked Control

  • Jiangnan Cheng
  • Marco Pavone
  • Sachin Katti
  • Sandeep Chinchali
  • Ao Tang

Sharing forecasts of network timeseries data, such as cellular or electricity load patterns, can improve independent control applications ranging from traffic scheduling to power generation. Typically, forecasts are designed without knowledge of a downstream controller's task objective, and thus simply optimize for mean prediction error. However, such task-agnostic representations are often too large to stream over a communication network and do not emphasize salient temporal features for cooperative control. This paper presents a solution to learn succinct, highly-compressed forecasts that are co-designed with a modular controller's task objective. Our simulations with real cellular, Internet-of-Things (IoT), and electricity load data show we can improve a model predictive controller's performance by at least 25% while transmitting 80% less data than the competing method. Further, we present theoretical compression results for a networked variant of the classical linear quadratic regulator (LQR) control problem.

IROS Conference 2021 Conference Paper

Interpretable Trade-offs Between Robot Task Accuracy and Compute Efficiency

  • Bineet Ghosh
  • Sandeep Chinchali
  • Parasara Sridhar Duggirala

A robot can invoke heterogeneous computation resources such as CPUs, cloud GPU servers, or even human computation for achieving a high-level goal. The problem of invoking an appropriate computation model so that it will successfully complete a task while keeping its compute and energy costs within a budget is called a model selection problem. In this paper, we present an optimal solution to the model selection problem with two compute models, the first being fast but less accurate, and the second being slow but more accurate. The main insight behind our solution is that a robot should invoke the slower compute model only when the benefits from the gain in accuracy outweigh the computational costs. We show that such cost-benefit analysis can be performed by leveraging the statistical correlation between the accuracy of fast and slow compute models. We demonstrate the broad applicability of our approach to diverse problems such as perception using neural networks and safe navigation of a simulated Mars rover.

ICLR Conference 2020 Conference Paper

Multi-agent Reinforcement Learning for Networked System Control

  • Tianshu Chu
  • Sandeep Chinchali
  • Sachin Katti

This paper considers multi-agent reinforcement learning (MARL) in networked system control. Specifically, each agent learns a decentralized control policy based on local observations and messages from connected neighbors. We formulate such a networked MARL (NMARL) problem as a spatiotemporal Markov decision process and introduce a spatial discount factor to stabilize the training of each local agent. Further, we propose a new differentiable communication protocol, called NeurComm, to reduce information loss and non-stationarity in NMARL. Based on experiments in realistic NMARL scenarios of adaptive traffic signal control and cooperative adaptive cruise control, an appropriate spatial discount factor effectively enhances the learning curves of non-communicative MARL algorithms, while NeurComm outperforms existing communication protocols in both learning efficiency and control performance.

RLDM Conference 2019 Conference Abstract

Reinforcement Learning for Network Offloading in Cloud Robotics

  • Sandeep Chinchali
  • Apoorva Sharma
  • Amine Elhafsi
  • Daniel Kang
  • Evgenya Perga-
  • Eyal Cidon
  • Sachin Katti

We apply deep reinforcement learning to a central decision-making problem in robotics - when should a robot use its on-board compute model or, in cases of local uncertainty, query a compute-intensive model in “the cloud”? Today’s robotic systems are increasingly turning to computationally expensive mod- els such as deep neural networks (DNNs) for tasks like object detection, perception and planning. However, resource-constrained robots, like low-power drones, often have insufficient on-board compute resources or power reserves to scalably run the most accurate, state-of-the art neural network compute models. Cloud robotics allows mobile robots to offload compute to centralized servers if they are uncertain locally or want to run more accurate, compute-intensive models. However, cloud robotics comes with a key, often un- derstated cost: communicating with the cloud over congested wireless networks may result in latency and increase network congestion. In fact, sending high data-rate video or LIDAR from multiple robots over congested networks can lead to prohibitive delay for real-time applications. We formulate a novel Robot Offloading Problem — how and when should robots offload sensing tasks, especially if they are uncertain, to improve accuracy while minimizing the cost of cloud communication? We formulate offloading as a sequential decision making problem for robots, and propose a solution using deep reinforcement learning. In both simulations and practical hardware experiments using state-of-the art vision DNNs, our offloading strategy improves vision task performance by between 1. 3-2. 6x of benchmark offloading strategies. We conclude by showing how cloud offloading has an inherent exploration vs. exploitation trade-off since a robot must balance use of a known local model (exploitation) with learning context-dependent utility of the cloud (exploration). Accordingly, we discuss how our model is widely applicable beyond cloud robotics.

AAAI Conference 2018 Conference Paper

Cellular Network Traffic Scheduling With Deep Reinforcement Learning

  • Sandeep Chinchali
  • Pan Hu
  • Tianshu Chu
  • Manu Sharma
  • Manu Bansal
  • Rakesh Misra
  • Marco Pavone
  • Sachin Katti

Modern mobile networks are facing unprecedented growth in demand due to a new class of traffic from Internet of Things (IoT) devices such as smart wearables and autonomous cars. Future networks must schedule delay-tolerant software updates, data backup, and other transfers from IoT devices while maintaining strict service guarantees for conventional realtime applications such as voice-calling and video. This problem is extremely challenging because conventional traffic is highly dynamic across space and time, so its performance is significantly impacted if all IoT traffic is scheduled immediately when it originates. In this paper, we present a reinforcement learning (RL) based scheduler that can dynamically adapt to traffic variation, and to various reward functions set by network operators, to optimally schedule IoT traffic. Using 4 weeks of real network data from downtown Melbourne, Australia spanning diverse traffic patterns, we demonstrate that our RL scheduler can enable mobile networks to carry 14. 7% more data with minimal impact on existing traffic, and outperforms heuristic schedulers by more than 2×. Our work is a valuable step towards designing autonomous, “selfdriving” networks that learn to manage themselves from past data.

ICRA Conference 2016 Conference Paper

Simultaneous model identification and task satisfaction in the presence of temporal logic constraints

  • Sandeep Chinchali
  • Scott C. Livingston
  • Marco Pavone 0001
  • Joel W. Burdick

Recent proliferation of cyber-physical systems, ranging from autonomous cars to nuclear hazard inspection robots, has exposed several challenging research problems on automated fault detection and recovery. This paper considers how recently developed formal synthesis and model verification techniques may be used to automatically generate information-seeking trajectories for anomaly detection. In particular, we consider the problem of how a robot could select its actions so as to maximally disambiguate between different model hypotheses that govern the environment it operates in or its interaction with other agents whose prime motivation is a priori unknown. The identification problem is posed as selection of the most likely model from a set of candidates, where each candidate is an adversarial Markov decision process (MDP) together with a linear temporal logic (LTL) formula that constrains robot-environment interaction. An adversarial MDP is an MDP in which transitions depend on both a (controlled) robot action and an (uncontrolled) adversary action. States are labeled, thus allowing interpretation of satisfaction of LTL formulae, which have a special form admitting satisfaction decisions in bounded time. An example where a robotic car must discern whether neighboring vehicles are following its trajectory for a surveillance operation is used to demonstrate our approach.

ICRA Conference 2012 Conference Paper

Towards formal synthesis of reactive controllers for dexterous robotic manipulation

  • Sandeep Chinchali
  • Scott C. Livingston
  • Ufuk Topcu
  • Joel W. Burdick
  • Richard M. Murray

In robotic finger gaiting, fingers continuously manipulate an object until joint limitations or mechanical limitations periodically force a switch of grasp. Current approaches to gait planning and control are slow, lack formal guarantees on correctness, and are generally not reactive to changes in object geometry. To address these issues, we apply advances in formal methods to model a gait subject to external perturbations as a two-player game between a finger controller and its adversarial environment. High-level specifications are expressed in linear temporal logic (LTL) and low-level control primitives are designed for continuous kinematics. Simulations of planar manipulation with our synthesized correct-by-construction gait controller demonstrate the benefits of this approach.

ICRA Conference 2010 Conference Paper

Axel rover paddle wheel design, efficiency, and sinkage on deformable terrain

  • Pablo Abad-Manterola
  • Joel W. Burdick
  • Issa A. D. Nesnas
  • Sandeep Chinchali
  • Christine Fuller
  • Xuecheng Zhou

This paper presents the Axel robotic rover which has been designed to provide robust and flexible access to extreme extra-planetary terrains. Axel is a lightweight 2-wheeled vehicle that can access steep slopes and negotiate relatively large obstacles due to its actively managed tether and novel wheel design. This paper reviews the Axel system and focuses on its novel paddle wheel characteristics. We show that the paddle design has superior rock climbing ability. We also adapt basic terramechanics principles to estimate the sinkage of paddle wheels on loose sand. Experimental comparisons between the transport efficiency of mountain bike wheels and paddle wheels are summarized. Finally, we present an unfolding wheel prototype which allows Axel to be compacted for efficient transport.