Author name cluster

Vishnu Dutt Sharma

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

RLJ Journal 2025 Journal Article

Hybrid Classical/RL Local Planner for Ground Robot Navigation

Vishnu Dutt Sharma
Jeongran Lee
Matthew Andrews
Ilija Hadžić

Local planning is an optimization process within a mobile robot navigation stack that searches for the best velocity vector, given the robot and environment state. Depending on how the optimization criteria and constraints are defined, some planners may be better than others in specific situations. We consider two conceptually different planners. The first planner explores the velocity space in real-time and has superior path-tracking and motion smoothness performance. The second planner was trained using reinforcement learning methods to produce the best velocity based on its training ""experience''. It is better at avoiding dynamic obstacles but at the expense of motion smoothness. We propose a simple yet effective meta-reasoning approach that takes advantage of both approaches by switching between planners based on the surroundings. We demonstrate the superiority of our hybrid planner, both qualitatively and quantitatively, over the individual planners on a live robot in different scenarios, achieving an improvement of 26\% in the navigation time.

PDF Details

RLC Conference 2025 Conference Paper

Hybrid Classical/RL Local Planner for Ground Robot Navigation

Vishnu Dutt Sharma
Jeongran Lee
Matthew Andrews
Ilija Hadžić

PDF Details

ICRA Conference 2025 Conference Paper

Improving Zero-Shot ObjectNav with Generative Communication

Vishnu Sashank Dorbala
Vishnu Dutt Sharma
Pratap Tokekar
Dinesh Manocha

We propose a new method for improving zero-shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and the ground agent with an obfuscated view; both equipped with Vision-Language Models (VLMs) for vision-to-language translation. In this assisted setup, the embodied agents communicate environmental information before the ground agent executes actions towards a target. Despite the overhead agent having a global view with the target, we note a drop in performance (−13% in OSR and −13% in SPL) of a fully cooperative assistance scheme over an unassisted baseline. In contrast, a selective assistance scheme where the ground agent retains its independent exploratory behaviour shows a 10% OSR and 7. 65% SPL improvement. To explain navigation performance, we analyze the GC for unique traits, quantifying the presence of hallucination and cooperation. Specifically, we identify the novel linguistic trait of preemptive hallucination in our embodied setting, where the overhead agent assumes that the ground agent has executed an action in the dialogue when it is yet to move, and note its strong correlation with navigation performance. We conduct real-world experiments and present some qualitative examples where we mitigate hallucinations via prompt finetuning to improve ObjectNav performance.

Details

IROS Conference 2024 Conference Paper

LAVA: Long-horizon Visual Action based Food Acquisition

Amisha Bhaskar
Rui Liu 0040
Vishnu Dutt Sharma
Guangyao Shi
Pratap Tokekar

Robotic Assisted Feeding (RAF) addresses the fundamental need for individuals with mobility impairments to regain autonomy in feeding themselves. The goal of RAF is to use a robot arm to acquire and transfer food to individuals from the table. Existing RAF methods primarily focus on solid foods, leaving a gap in manipulation strategies for semisolid and deformable foods. We present Long-horizon Visual Action-based (LAVA) food acquisition of liquid, semisolid, and deformable foods. Long-horizon refers to the goal of "clearing the bowl" by sequentially acquiring the food from the bowl. LAVA is hierarchical: (1) At the highest level, we determine primitives using ScoopNet. (2) At the mid-level, LAVA finds parameters for the low-level primitives. (3) At the lowest level, LAVA carries out action execution using behavior cloning. We validate LAVA on real-world acquisition trials involving granular, liquid, semisolid, and deformable foods along with fruit chunks and soup. Across 46 bowls, LAVA acquires much more efficiently than baselines with a success rate of 89±4%, and generalizes across realistic plate variations such as varying positions, varieties, and amount of food in the bowl. Datasets and supplementary materials can be found on our website.

Details

IROS Conference 2024 Conference Paper

MAP-NBV: Multi-agent Prediction-guided Next-Best-View Planning for Active 3D Object Reconstruction

Harnaik Dhami
Vishnu Dutt Sharma
Pratap Tokekar

Next-Best View (NBV) planning is a long-standing problem of determining where to obtain the next best view of an object from, by a robot that is viewing the object. There are a number of methods for choosing NBV based on the observed part of the object. In this paper, we investigate how predicting the unobserved part helps with the efficiency of reconstructing the object. We present, Multi-Agent Prediction-Guided NBV (MAP-NBV), a decentralized coordination algorithm for active 3D reconstruction with multi-agent systems. Prediction-based approaches have shown great improvement in active perception tasks by learning the cues about structures in the environment from data. However, these methods primarily focus on single-agent systems. We design a decentralized next-best-view approach that utilizes geometric measures over the predictions and jointly optimizes the information gain and control effort for efficient collaborative 3D reconstruction of the object. Our method achieves 19% improvement over the non-predictive multi-agent approach in simulations using AirSim and ShapeNet. We make our code publicly available through our project website: http://raaslab.org/projects/MAPNBV/.

Details

ICRA Conference 2024 Conference Paper

Pre-Trained Masked Image Model for Mobile Robot Navigation

Vishnu Dutt Sharma
Anukriti Singh
Pratap Tokekar

2D top-down maps are commonly used for the navigation and exploration of mobile robots through unknown areas. Typically, the robot builds the navigation maps incrementally from local observations using onboard sensors. Recent works have shown that predicting the structural patterns in the environment through learning-based approaches can greatly enhance task efficiency. While many such works build task-specific networks using limited datasets, we show that the existing foundational vision networks can accomplish the same without any fine-tuning. Specifically, we use Masked Autoencoders, pre-trained on street images, to present novel applications for field-of-view expansion, single-agent topological exploration, and multi-agent exploration for indoor mapping, across different input modalities. Our work motivates the use of foundational vision models for generalized structure prediction-driven applications, especially in the dearth of training data. We share more qualitative results at https://raaslab.org/projects/MIM4Robots.

Details

ICRA Conference 2023 Conference Paper

D2CoPlan: A Differentiable Decentralized Planner for Multi-Robot Coverage

Vishnu Dutt Sharma
Lifeng Zhou 0001
Pratap Tokekar

Centralized approaches for multi-robot coverage planning problems suffer from the lack of scalability. Learning-based distributed algorithms provide a scalable avenue in addition to bringing data-oriented feature generation capabilities to the table, allowing integration with other learning-based approaches. To this end, we present a learning-based, differentiable distributed coverage planner (D2CoP LAN ) which scales efficiently in runtime and number of agents compared to the expert algorithm, and performs on par with the classical distributed algorithm. In addition, we show that D2CoP LAN can be seamlessly combined with other learning methods to learn end-to-end, resulting in a better solution than the individually trained modules, opening doors to further research for tasks that remain elusive with classical methods.

Details

IROS Conference 2023 Conference Paper

Pred-NBV: Prediction-Guided Next-Best-View Planning for 3D Object Reconstruction

Harnaik Dhami
Vishnu Dutt Sharma
Pratap Tokekar

Prediction-based active perception has shown the potential to improve the navigation efficiency and safety of the robot by anticipating the uncertainty in the unknown environment. The existing works for 3D shape prediction make an implicit assumption about the partial observations and therefore cannot be used for real-world planning and do not consider the control effort for next-best-view planning. We present Pred-NBV, a realistic object shape reconstruction method consisting of PoinTr-C, an enhanced 3D prediction model trained on the ShapeNet dataset, and an information and control effort-based next-best-view method to address these issues. Pred-NBV shows an improvement of 25. 46% in object coverage over the traditional methods in the AirSim simulator, and performs better shape completion than PoinTr, the state-of-the-art shape completion model, even on real data obtained from a Velodyne 3D LiDAR mounted on DJI M600 Pro.

Details

IROS Conference 2023 Conference Paper

ProxMaP: Proximal Occupancy Map Prediction for Efficient Indoor Robot Navigation

Vishnu Dutt Sharma
Jingxi Chen
Pratap Tokekar

Planning a path for a mobile robot typically requires building a map (e. g. , an occupancy grid) of the environment as the robot moves around. While navigating in an unknown environment, the map built by the robot online may have many as-yet-unknown regions. A conservative planner may avoid such regions taking a longer time to navigate to the goal. Instead, if a robot is able to correctly predict the occupancy in the occluded regions, the robot may navigate efficiently. We present a self-supervised occupancy prediction technique, ProxMaP, to predict the occupancy within the proximity of the robot to enable faster navigation. We show that ProxMaP generalizes well across realistic and real domains, and improves the robot navigation efficiency in simulation by 12. 40% against a traditional navigation method. We share our findings and code at https://raaslab.org/projects/ProxMaP.

Details

IROS Conference 2020 Conference Paper

Risk-Aware Planning and Assignment for Ground Vehicles using Uncertain Perception from Aerial Vehicles

Vishnu Dutt Sharma
Maymoonah Toubeh
Lifeng Zhou 0001
Pratap Tokekar

We propose a risk-aware framework for multi-robot, multi-demand assignment and planning in unknown environments. Our motivation is disaster response and search-and-rescue scenarios where ground vehicles must reach demand locations as soon as possible. We consider a setting where the terrain information is available only in the form of an aerial, georeferenced image. Deep learning techniques can be used for semantic segmentation of the aerial image to create a cost map for safe ground robot navigation. Such segmentation may still be noisy. Hence, we present a joint planning and perception framework that accounts for the risk introduced due to noisy perception. Our contributions are two-fold: (i) we show how to use Bayesian deep learning techniques to extract risk at the perception level; and (ii) use a risk-theoretical measure, CVaR, for risk-aware planning and assignment. The pipeline is theoretically established, then empirically analyzed through two datasets. We find that accounting for risk at both levels produces quantifiably safer paths and assignments.

Details