Author name cluster

Soshi Iba

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

1 author row

IROS Conference 2025 Conference Paper

A Probabilistic Programming Approach to Intention Estimation in Human-Robot Teleoperated Assembly Tasks

Aolin Xu 0002
Songpo Li
Prakash Baskaran
Karankumar Patel
Soshi Iba
Behzad Dariush

We propose a new approach to solving the problem of intention estimation in human-robot teleoperation for assembly tasks, which includes task estimation and action prediction. Our approach uses probabilistic graphical models to represent the joint distribution of the task and the actions to be taken to complete the task. Both model learning and inference are implemented with Pyro, a state-of-the-art probabilistic programming language. The distinctive feature from the traditional hidden Markov model type of probabilistic methods is that our model takes the time information into account and explicitly models the individual distributions of all the variables under consideration. By doing this, we fully utilize the power of probabilistic programming, and achieve accurate distribution hence uncertainty estimations. Working with a pretrained action recognition module, the proposed model can be trained solely on a tiny instruction manual of the assembly tasks and can be retrained with minimal overhead whenever the manual is changed or augmented, avoiding the need for the costly data reannotation and retraining by the end-to-end learning based methods. We also compare our method with a transformer based model trained directly on the instruction manual, and our method shows superior accuracy in both intention estimation and their distribution estimations. We additionally identify failure cases of both our method and the transformer-based method, and envision methods for improvement.

ICRA Conference 2025 Conference Paper

Diffusion-Informed Probabilistic Contact Search for Multi-Finger Manipulation

Abhinav Kumar
Thomas Power
Fan Yang 0144
Sergio Aguilera Marinovic
Soshi Iba
Rana Soltani-Zarrin
Dmitry Berenson

Planning contact-rich interactions for multi-finger manipulation is challenging due to the high-dimensionality and hybrid nature of dynamics. Recent advances in data-driven methods have shown promise, but are sensitive to the quality of training data. Combining learning with classical methods like trajectory optimization and search adds additional structure to the problem and domain knowledge in the form of constraints, which can lead to outperforming the data on which models are trained. We present Diffusion-Informed Probabilistic Contact Search (DIPS), which uses an A* search to plan a sequence of contact modes informed by a diffusion model. We train the diffusion model on a dataset of demonstrations consisting of contact modes and trajectories generated by a trajectory optimizer given those modes. In addition, we use a particle filter-inspired method to reason about variability in diffusion sampling arising from model error, estimating likelihoods of trajectories using a learned discriminator. We show that our method outperforms ablations that do not reason about variability and can plan contact sequences that outperform those found in training data across multiple tasks. We evaluate on simulated tabletop card sliding and screwdriver turning tasks, as well as the screwdriver task in hardware to show that our combined learning and planning approach transfers to the real world.

IROS Conference 2025 Conference Paper

Edit Distance Based Intention Estimation for Teleoperated Assembly

Aolin Xu 0002
Songpo Li
Prakash Baskaran
Soshi Iba
Behzad Dariush

We address the problem of intention estimation in human-robot teleoperation, which involves identifying the task being completed and predicting the next actions. Our approach sequentially quantifies the similarity between the observed action sequence and nominal action sequences representing possible tasks using the edit distance metric. Task estimation and action prediction are then performed using a nearest-neighbor rule. A key advantage of our approach is its robustness to deviations in operator actions and action recognition errors, commonly encountered in real-world teleoperation settings. Through extensive experiments on both real and simulated data, we demonstrate that our method largely outperforms alternative approaches, including probabilistic graphical models and transformer-based methods, particularly in scenarios with significant action deviations or action recognition errors. Additionally, we construct task distance matrices to analyze task similarities and potential confusion points, providing insights into when and where estimation errors are likely to occur. This analysis can guide the design of more distinctive task sequences and further improve the reliability of teleoperated robotic systems.

IROS Conference 2025 Conference Paper

eXplainable Intention Estimation in Teleoperated Manipulation Using Deep Dynamic Graph Neural Networks

Prakash Baskaran
Xiao Liu
Songpo Li
Soshi Iba

Shared autonomy can improve teleoperating robotic systems in complex manufacturing and assembly tasks by combining human decision-making and robotic capabilities. A key aspect of seamless collaboration and trust in shared autonomy is the robot’s ability to interpret human intentions in a consistent and explainable manner. To achieve this, a graph neural network-based intention estimation framework is introduced, which generates dynamic graphs that capture spatial relationships evolving over time. The framework predicts human intentions at two hierarchical levels: low-level actions and high-level tasks. Furthermore, we empirically and anecdo-tally verify the correctness and consistency of the predictions using explainability metrics. The algorithm is demonstrated by teleoperating a bi-manual robot to assemble various block structures in a virtual reality simulation environment.

ICRA Conference 2024 Conference Paper

Hierarchical Deep Learning for Intention Estimation of Teleoperation Manipulation in Assembly Tasks

Mingyu Cai
Karankumar Patel
Soshi Iba
Songpo Li

In human-robot collaboration, shared control presents an opportunity to teleoperate robotic manipulation to improve the efficiency of manufacturing and assembly processes. Robots are expected to assist in executing the user’s intentions. To this end, robust and prompt intention estimation is needed, relying on behavioral observations. The framework presents an intention estimation technique at hierarchical levels i. e. , low-level actions and high-level tasks, by incorporating multi-scale hierarchical information in neural networks. Technically, we employ hierarchical dependency loss to boost overall accuracy. Furthermore, we propose a multi-window method that assigns proper hierarchical prediction windows of input data. An analysis of the predictive power with various inputs demonstrates the predominance of the deep hierarchical model in the sense of prediction accuracy and early intention identification. We implement the algorithm on a virtual reality (VR) setup to teleoperate robotic hands in a simulation with various assembly tasks to show the effectiveness of online estimation. Video demonstration is available at: https://youtu.be/CMYDgcI4j1g.

IROS Conference 2024 Conference Paper

HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning

Hongyu Li 0003
Snehal Dikhale
Jinda Cui
Soshi Iba
Nawid Jamali

To achieve dexterity comparable to that of humans, robots must intelligently process tactile sensor data. Taxel-based tactile signals often have low spatial-resolution, with non-standardized representations. In this paper, we propose a novel framework, HyperTaxel, for learning a geometrically-informed representation of taxel-based tactile signals to address challenges associated with their spatial resolution. We use this representation and a contrastive learning objective to encode and map sparse low-resolution taxel signals to high-resolution contact surfaces. To address the uncertainty inherent in these signals, we leverage joint probability distributions across multiple simultaneous contacts to improve taxel hyper-resolution. We evaluate our representation by comparing it with two baselines and present results that suggest our representation outperforms the baselines. Furthermore, we present qualitative results that demonstrate the learned representation captures the geometric features of the contact surface, such as flatness, curvature, and edges, and generalizes across different objects and sensor configurations. Moreover, we present results that suggest our representation improves the performance of various downstream tasks, such as surface classification, 6D in-hand pose estimation, and sim-to-real transfer.

ICRA Conference 2023 Conference Paper

Hierarchical Graph Neural Networks for Proprioceptive 6D Pose Estimation of In-hand Objects

Alireza Rezazadeh
Snehal Dikhale
Soshi Iba
Nawid Jamali

Robotic manipulation, in particular in-hand object manipulation, often requires an accurate estimate of the object's 6D pose. To improve the accuracy of the estimated pose, state-of-the-art approaches in 6D object pose estimation use observational data from one or more modalities, e. g. , RGB images, depth, and tactile readings. However, existing approaches make limited use of the underlying geometric structure of the object captured by these modalities, thereby, increasing their reliance on visual features. This results in poor performance when presented with objects that lack such visual features or when visual features are simply occluded. Furthermore, current approaches do not take advantage of the proprioceptive information embedded in the position of the fingers. To address these limitations, in this paper: (1) we introduce a hierarchical graph neural network architecture for combining multimodal (vision and touch) data that allows for a geometrically informed 6D object pose estimation, (2) we introduce a hierarchical message passing operation that flows the information within and across modalities to learn a graph-based object representation, and (3) we introduce a method that accounts for the proprioceptive information for in-hand object representation. We evaluate our model on a diverse subset of objects from the YCB Object and Model Set, and show that our method substantially outperforms existing state-of-the-art work in accuracy and robustness to occlusion. We also deploy our proposed framework on a real robot and qualitatively demonstrate successful transfer to real settings.

ICRA Conference 2021 Conference Paper

Learning Dense Visual Correspondences in Simulation to Smooth and Fold Real Fabrics

Aditya Ganapathi
Priya Sundaresan
Brijen Thananjeyan
Ashwin Balakrishna
Daniel Seita
Jennifer Grannen
Minho Hwang
Ryan Hoque

Robotic fabric manipulation is challenging due to the infinite dimensional configuration space, self-occlusion, and complex dynamics of fabrics. There has been significant prior work on learning policies for specific fabric manipulation tasks, but comparatively less focus on algorithms which can perform many different tasks. We take a step towards this goal by learning point-pair correspondences across different fabric configurations in simulation. Then, given a single demonstration of a new task from an initial fabric configuration, these correspondences can be used to compute geometrically equivalent actions in a new fabric configuration. This makes it possible to define policies to robustly imitate a broad set of multi-step fabric smoothing and folding tasks. The resulting policies achieve 80. 3% average task success rate across 10 fabric manipulation tasks on two different physical robotic systems. Results also suggest robustness to fabrics of various colors, sizes, and shapes. See https://tinyurl.com/fabric-descriptors for supplementary material and videos.

IROS Conference 2020 Conference Paper

Deep Imitation Learning of Sequential Fabric Smoothing From an Algorithmic Supervisor

Daniel Seita
Aditya Ganapathi
Ryan Hoque
Minho Hwang
Edward Cen
Ajay Kumar Tanwani
Ashwin Balakrishna
Brijen Thananjeyan

Sequential pulling policies to flatten and smooth fabrics have applications from surgery to manufacturing to home tasks such as bed making and folding clothes. Due to the complexity of fabric states and dynamics, we apply deep imitation learning to learn policies that, given color (RGB), depth (D), or combined color-depth (RGBD) images of a rectangular fabric sample, estimate pick points and pull vectors to spread the fabric to maximize coverage. To generate data, we develop a fabric simulator and an algorithmic supervisor that has access to complete state information. We train policies in simulation using domain randomization and dataset aggregation (DAgger) on three tiers of difficulty in the initial randomized configuration. We present results comparing five baseline policies to learned policies and report systematic comparisons of RGB vs D vs RGBD images as inputs. In simulation, learned policies achieve comparable or superior performance to analytic baselines. In 180 physical experiments with the da Vinci Research Kit (dVRK) surgical robot, RGBD policies trained in simulation attain coverage of 83% to 95% depending on difficulty tier, suggesting that effective fabric smoothing policies can be learned from an algorithmic supervisor and that depth sensing is a valuable addition to color alone. Supplementary material is available at https://sites.google.com/view/fabric-smoothing.

IROS Conference 2020 Conference Paper

Deep Tactile Experience: Estimating Tactile Sensor Output from Depth Sensor Data

Karankumar Patel
Soshi Iba
Nawid Jamali

Tactile sensing is inherently contact based. To use tactile data, robots need to make contact with the surface of an object. This is inefficient in applications where an agent needs to make a decision between multiple alternatives that depend the physical properties of the contact location. We propose a method to get tactile data in a non-invasive manner. The proposed method estimates the output of a tactile sensor from the depth data of the surface of the object based on past experiences. An experience dataset is built by allowing the robot to interact with various objects, collecting tactile data and the corresponding object surface depth data. We use the experience dataset to train a neural network to estimate the tactile output from depth data alone. We use GelSight tactile sensors, an image-based sensor, to generate images that capture detailed surface features at the contact location. We train a network with a dataset containing 578 tactile-image to depth- map correspondences. Given a depth-map of the surface of an object, the network outputs an estimate of the response of the tactile sensor, should it make a contact with the object. We evaluate the method with structural similarity index matrix (SSIM), a similarity metric between two images commonly used in image processing community. We present experimental results that show the proposed method outperforms a baseline that uses random images with statistical significance getting an SSIM score of 0. 84 ± 0. 0056 and 0. 80 ± 0. 0036, respectively.

ICRA Conference 2019 Conference Paper

Dynamic Channel: A Planning Framework for Crowd Navigation

Chao Cao
Peter Trautman
Soshi Iba

Real-time navigation in dense human environments is a challenging problem in robotics. Most existing path planners fail to account for the dynamics of pedestrians because introducing time as an additional dimension in search space is computationally prohibitive. Alternatively, most local motion planners only address imminent collision avoidance and fail to offer long-term optimality. In this work, we present an approach, called Dynamic Channels, to solve this global to local quandary. Our method combines the high-level topological path planning with low-level motion planning into a complete pipeline. By formulating the path planning problem as graph-searching in the triangulation space, our planner is able to explicitly reason about the obstacle dynamics and capture the environmental change efficiently. We evaluate efficiency and performance of our approach on public pedestrian datasets and compare it to a state-of-the-art planning algorithm for dynamic obstacle avoidance. Completeness proofs are provided in the supplement at http://caochao.me/files/proof.pdf.An extended version of the paper is available on arXiv.

IROS Conference 2014 Conference Paper

Receding horizon optimization of robot motions generated by hierarchical movement primitives

Manuel Mühlig
Akinobu Hayashi
Michael Gienger
Soshi Iba
Takahide Yoshiike

This paper introduces a motion generation framework that integrates a hierarchical movement primitive (MP) layer with optimal control in form of receding horizon optimization. In order to benefit from fast reactions on the MP-layer, the optimal control layer can be overridden in risky situations to generate quick, though non-optimal solutions. By this, the system fulfills four desirable properties. It continuously adapts the robot's motion without noticeable delay (1) by optimizing for collision and joint limit avoidance based on a future time horizon instead of the current state only (2). It accounts for the full robot motion that may result from multiple active MPs at the same time (3) and despite a possibly slow optimization still provides the robustness and quick reaction capabilities of MPs (4). The framework has been validated in an experiment in which a humanoid robot performed a task, optimized wrt. collisions and joint limit avoidance, but still could react within 50 ms after detection of a potential risk.

IROS Conference 2003 Conference Paper

Intention aware interactive multi-modal robot programming

Soshi Iba
Christiaan J. J. Paredis
Pradeep K. Khosla

As robots enter the human environment, there are increasing needs for novice users to be able to program robots with ease. A successful robot programming system should be intuitive, interactive, and intention aware. Intuitiveness refers to the use of intuitive user interfaces such as speech and hand gestures. Interactivity refers to the system's ability to let the user interact preemptively with the robot to take its control at any given time. Intention awareness refers to the system's ability to recognize and adapt to user intent. This paper focuses on the intention awareness problem for interactive multi-modal robot programming system. In our framework, user intent takes on the form of a robot program, which in our context is a sequential set of commands with parameters. To solve the intention recognition and adaptation problem, the system converts robot programs into a set of Markov chains. The system can then deduce the most likely program the user intends to execute based on a given observation sequence. It then adapts this program based on additional interaction. The system is implemented on a mobile vacuum cleaning robot with a user who is wearing sensor gloves, inductive position sensors, and a microphone.

ICRA Conference 2002 Conference Paper

Interactive Multi-Modal Robot Programming

Soshi Iba
Christiaan J. J. Paredis
Pradeep K. Khosla

This paper introduces a novel approach to program a robot interactively through a multi-modal interface. The key characteristic of this approach is that the user can provide feedback interactively at any time - during both the programming and the execution phase. The framework takes a three-step approach to the problem: multi-modal recognition, intention interpretation, and prioritized task execution. The multi-modal recognition module translates hand gestures and spontaneous speech into a structured symbolic data stream without abstracting away the user's intent. The intention interpretation module selects the appropriate primitives to generate a task based on the user's input, the system's current state, and robot sensor data. Finally, the prioritized task execution module selects and executes skill primitives based on the system's current state, sensor inputs, and prior tasks. The framework is demonstrated by interactively controlling and programming a vacuum-cleaning robot.

IROS Conference 2000 Conference Paper

Recognition of human task by attention point analysis

Koichi Ogawara
Soshi Iba
Tomikazu Tanuki
Hiroshi Kimura
Katsushi Ikeuchi

This paper presents a novel method of constructing a human task model by attention point (AP) analysis. The AP analysis consists of two steps: at the first step, it broadly observes human task, constructs rough human task model and finds APs which require detailed analysis; and at the second step, by applying time-consuming analysis on APs in the same human task, it can enhance the human task model. This human task model is highly abstracted and is able to change the degree of abstraction adapting to the environment so as to be applicable in a different environment. We describe this method and its implementation using data gloves and a stereo vision system. We also show an experimental result in which a real robot observed a human task and performed the same human task successfully in a different environment using this model.

IROS Conference 1999 Conference Paper

An architecture for gesture-based control of mobile robots

Soshi Iba
Michael Vande Weghe
Christiaan J. J. Paredis
Pradeep K. Khosla

Gestures provide a rich and intuitive form of interaction for controlling robots. This paper presents an approach for controlling a mobile robot with hand gestures. The system uses hidden Markov models (HMMs) to spot and recognize gestures captured with a data glove. To spot gestures from a sequence of hand positions that may include nongestures, we have introduced a "wait state" in the HMM. The system is currently capable of spotting six gestures reliably. These gestures are mapped to robot commands under two different modes of operation: local and global control. In the local control mode, the gestures are interpreted in the robot's local frame of reference, allowing the user to accelerate, decelerate, and turn. In the global control mode, the gestures are interpreted in the world frame, allowing the robot to move to the location at which the user is pointing.