Author name cluster

Gregory D. Hager

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

82 papers

2 author rows

ICRA Conference 2024 Conference Paper

Domain Adaptation of Visual Policies with a Single Demonstration

Weiyao Wang 0002
Gregory D. Hager

Deploying machine learning algorithms for robot tasks in real-world applications presents a core challenge: overcoming the domain gap between the training and the deployment environment. This is particularly difficult for visuomotor policies that utilize high-dimensional images as input, particularly when those images are generated via simulation. A common method to tackle this issue is through domain randomization, which aims to broaden the span of the training distribution to cover the test-time distribution. However, this approach is only effective when the domain randomization encompasses the actual shifts in the test-time distribution. We take a different approach, where we make use of a single demonstration (a prompt) to learn policy that adapts to the testing target environment. Our proposed framework, PromptAdapt, leverages the Transformer architecture’s capacity to model sequential data to learn demonstration-conditioned visual policies, allowing for in-context adaptation to a target domain that is distinct from training. Our experiments in both simulation and real-world settings show that PromptAdapt is a strong domain-adapting policy that outperforms baseline methods by a large margin under a range of domain shifts, including variations in lighting, color, texture, and camera pose. Videos and more information can be viewed at project webpage: https://sites.google.com/view/promptadapt.

IROS Conference 2024 Conference Paper

VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation

Weiyao Wang 0002
Yutian Lei
Shiyu Jin
Gregory D. Hager
Liangjun Zhang

In this work, we introduce the Virtual In-Hand Eye Transformer (VIHE), a novel method designed to enhance 3D manipulation capabilities through action-aware view rendering. VIHE autoregressively refines actions in multiple stages by conditioning on rendered views posed from action predictions in the earlier stages. These virtual in-hand views provide a strong inductive bias for effectively recognizing the correct pose for the hand, especially for challenging high-precision tasks such as peg insertion. On 18 manipulation tasks in RLBench simulated environments, VIHE achieves a new state-of-the-art, with a 12% absolute improvement, increasing from 65% to 77% over the existing state-of-the-art model using 100 demonstrations per task. In real-world scenarios, VIHE can learn manipulation tasks with just a handful of demonstrations, highlighting its practical utility. Videos and code implementation can be found at our project site: https://vihe-3d.github.io.

ICRA Conference 2022 Conference Paper

SAGE: SLAM with Appearance and Geometry Prior for Endoscopy

Xingtong Liu
Zhaoshuo Li
Masaru Ishii
Gregory D. Hager
Russell H. Taylor
Mathias Unberath

In endoscopy, many applications (e. g. , surgical navigation) would benefit from a real-time method that can simultaneously track the endoscope and reconstruct the dense 3D geometry of the observed anatomy from a monocular endoscopic video. To this end, we develop a Simultaneous Localization and Mapping system by combining the learning-based appearance and optimizable geometry priors and factor graph optimization. The appearance and geometry priors are explicitly learned in an end-to-end differentiable training pipeline to master the task of pair-wise image alignment, one of the core components of the SLAM system. In our experiments, the proposed SLAM system is shown to robustly handle the challenges of texture scarceness and illumination variation that are commonly seen in endoscopy. The system generalizes well to unseen endoscopes and subjects and performs favorably compared with a state-of-the-art feature-based SLAM system. The code repository is available at https://github.com/lppllpp1920/SAGE-SLAM.git.

AAAI Conference 2021 Conference Paper

DASZL: Dynamic Action Signatures for Zero-shot Learning

Tae Soo Kim
Jonathan Jones
Michael Peven
Zihao Xiao
Jin Bai
Yi Zhang
Weichao Qiu
Alan Yuille

There are many realistic applications of activity recognition where the set of potential activity descriptions is combinatorially large. This makes end-to-end supervised training of a recognition system impractical as no training set is practically able to encompass the entire label set. In this paper, we present an approach to fine-grained recognition that models activities as compositions of dynamic action signatures. This compositional approach allows us to reframe fine-grained recognition as zero-shot activity recognition, where a detector is composed “on the fly” from simple first-principles state machines supported by deep-learned components. We evaluate our method on the Olympic Sports and UCF101 datasets, where our model establishes a new state of the art under multiple experimental paradigms. We also extend this method to form a unique framework for zero-shot joint segmentation and classification of activities in video and demonstrate the first results in zero-shot decoding of complex action sequences on a widely-used surgical dataset. Lastly, we show that we can use off-the-shelf object detectors to recognize activities in completely de-novo settings with no additional training.

IROS Conference 2021 Conference Paper

Localization and Control of Magnetic Suture Needles in Cluttered Surgical Site with Blood and Tissue

Will Pryor
Yotam Barnoy
Suraj Raval
Xiaolong Liu 0002
Lamar O. Mair
Daniel Lerner
Onder Erin
Gregory D. Hager

Real-time visual localization of needles is necessary for various surgical applications, including surgical automation and visual feedback. In this study we investigate localization and autonomous robotic control of needles in the context of our magneto-suturing system. Our system holds the potential for surgical manipulation with the benefit of minimal invasiveness and reduced patient side effects. However, the nonlinear magnetic fields produce unintuitive forces and demand delicate position-based control that exceeds the capabilities of direct human manipulation. This makes automatic needle localization a necessity. Our localization method combines neural network-based segmentation and classical techniques, and we are able to consistently locate our needle with 0. 73mm RMS error in clean environments and 2. 72mm RMS error in challenging environments with blood and occlusion. The average localization RMS error is 2. 16 mm for all environments we used in the experiments. We combine this localization method with our closed-loop feedback control system to demonstrate the further applicability of localization to autonomous control. Our needle is able to follow a running suture path in (1) no blood, no tissue; (2) heavy blood, no tissue; (3) no blood, with tissue; and (4) heavy blood, with tissue environments. The tip position tracking error ranges from 2. 6mm to 3. 7mm RMS, opening the door towards autonomous suturing tasks.

ICRA Conference 2021 Conference Paper

Out-of-Distribution Robustness with Deep Recursive Filters

Kapil D. Katyal
I-Jeng Wang
Gregory D. Hager

Accurate state and uncertainty estimation is imperative for mobile robots and self driving vehicles to achieve safe navigation in pedestrian rich environments. A critical component of state and uncertainty estimation for robot navigation is to perform robustly under out-of-distribution noise. Traditional methods of state estimation decouple perception and state estimation making it difficult to operate on noisy, high dimensional data. Here, we describe an approach that combines the expressiveness of deep neural networks with principled approaches to uncertainty estimation found in recursive filters. We particularly focus on techniques that provide better robustness to out-of-distribution noise and demonstrate applicability of our approach on two scenarios: a simple noisy pendulum state estimation problem and real world pedestrian localization using the nuScenes dataset [1]. We show that our approach improves state and uncertainty estimation compared to baselines while achieving approximately 3× improvement in computational efficiency.

IROS Conference 2021 Conference Paper

Robust Policy Search for an Agile Ground Vehicle Under Perception Uncertainty

Shahriar Sefati
Subhransu Mishra
Matthew Sheckells
Kapil D. Katyal
Jin Bai 0001
Gregory D. Hager
Marin Kobilarov

Learning robust policies for robotic systems operating in presence of uncertainty is a challenging task. For safe navigation, in addition to the natural stochasticity of the environment and vehicle dynamics, the perception uncertainty associated with dynamic entities, e. g. pedestrians, must be accounted for during motion planning. To this end, we construct an algorithm with built-in robustness to uncertainty by directly minimizing an upper confidence bound on the expected cost of trajectories instead of employing a standard approach based on minimizing the expected cost itself. Perception uncertainty is incorporated into the policy search framework by predicting each pedestrian’s intent belief and propagating their state distribution in time using closed-loop goal-directed dynamics. We train the policy in simulation and show that it could be transferred to an agile ground vehicle for successful autonomous robot navigation in presence of pedestrians with perception uncertainty. We further show the superior performance of this policy over a policy that does not consider pedestrian intent and perception uncertainty.

ICRA Conference 2020 Conference Paper

Autonomously Navigating a Surgical Tool Inside the Eye by Learning from Demonstration

Ji Woong Kim
Changyan He
Muller Urias
Peter Gehlbach
Gregory D. Hager
Iulian I. Iordachita
Marin Kobilarov

A fundamental challenge in retinal surgery is safely navigating a surgical tool to a desired goal position on the retinal surface while avoiding damage to surrounding tissues, a procedure that typically requires tens-of-microns accuracy. In practice, the surgeon relies on depth-estimation skills to localize the tool-tip with respect to the retina in order to perform the tool-navigation task, which can be prone to human error. To alleviate such uncertainty, prior work has introduced ways to assist the surgeon by estimating the tooltip distance to the retina and providing haptic or auditory feedback. However, automating the tool-navigation task itself remains unsolved and largely unexplored. Such a capability, if reliably automated, could serve as a building block to streamline complex procedures and reduce the chance for tissue damage. Towards this end, we propose to automate the tool-navigation task by learning to mimic expert demonstrations of the task. Specifically, a deep network is trained to imitate expert trajectories toward various locations on the retina based on recorded visual servoing to a given goal specified by the user. The proposed autonomous navigation system is evaluated in simulation and in physical experiments using a silicone eye phantom. We show that the network can reliably navigate a needle surgical tool to various desired locations within 137 μm accuracy in physical experiments and 94 μm in simulation on average, and generalizes well to unseen situations such as in the presence of auxiliary surgical tools, variable eye backgrounds, and brightness conditions.

ICRA Conference 2020 Conference Paper

Intent-Aware Pedestrian Prediction for Adaptive Crowd Navigation

Kapil D. Katyal
Gregory D. Hager
Chien-Ming Huang 0001

Mobile robots capable of navigating seamlessly and safely in pedestrian rich environments promise to bring robotic assistance closer to our daily lives. In this paper we draw on insights of how humans move in crowded spaces to explore how to recognize pedestrian navigation intent, how to predict pedestrian motion and how a robot may adapt its navigation policy dynamically when facing unexpected human movements. Our approach is to develop algorithms that replicate this behavior. We experimentally demonstrate the effectiveness of our prediction algorithm using real-world pedestrian datasets and achieve comparable or better prediction accuracy compared to several state-of-the-art approaches. Moreover, we show that confidence of pedestrian prediction can be used to adjust the risk of a navigation policy adaptively to afford the most comfortable level as measured by the frequency of personal space violation in comparison with baselines. Furthermore, our adaptive navigation policy is able to reduce the number of collisions by 43% in the presence of novel pedestrian motion not seen during training.

IROS Conference 2019 Conference Paper

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Andrew Hundt
Varun Jain
Chia-Hung Lin
Chris Paxton 0001
Gregory D. Hager

A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances. To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5. 1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12, 000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation. We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset. The CoSTAR BSD, code, and instructions are available at sites. google.com/site/costardataset.

ICRA Conference 2019 Conference Paper

Uncertainty-Aware Occupancy Map Prediction Using Generative Networks for Robot Navigation

Kapil D. Katyal
Katie M. Popek
Chris Paxton 0001
Philippe Burlina
Gregory D. Hager

Efficient exploration through unknown environments remains a challenging problem for robotic systems. In these situations, the robot's ability to reason about its future motion is often severely limited by sensor field of view (FOV). By contrast, biological systems routinely make decisions by taking into consideration what might exist beyond their FOV based on prior experience. We present an approach for predicting occupancy map representations of sensor data for future robot motions using deep neural networks. We develop a custom loss function used to make accurate prediction while emphasizing physical boundaries. We further study extensions to our neural network architecture to account for uncertainty and ambiguity inherent in mapping and exploration. Finally, we demonstrate a combined map prediction and information-theoretic exploration strategy using the variance of the generated hypotheses as the heuristic for efficient exploration of unknown environments.

ICRA Conference 2019 Conference Paper

Visual Robot Task Planning

Chris Paxton 0001
Yotam Barnoy
Kapil D. Katyal
Raman Arora
Gregory D. Hager

Prospection is key to solving challenging problems in new environments, but it has not been deeply explored as applied to task planning for perception-driven robotics. We propose visual robot task planning, where we take in an input image and must generate a sequence of high-level actions and associated observations that achieve some task. In this paper, we describe a neural network architecture and associated planning algorithm that (1) learns a representation of the world that can generate prospective futures, (2) uses this generative model to simulate the result of sequences of high-level actions in a variety of environments, and (3) evaluates these actions via a variant of Monte Carlo Tree Search to find a viable solution to a particular problem. Our approach allows us to visualize intermediate motion goals and learn to plan complex activity from visual information, and used this to generate and visualize task plans on held-out examples of a block-stacking simulation.

IROS Conference 2018 Conference Paper

Evaluating Methods for End-User Creation of Robot Task Plans

Chris Paxton 0001
Felix Jonathan
Andrew Hundt
Bilge Mutlu
Gregory D. Hager

How can we enable users to create effective, perception-driven task plans for collaborative robots? We conducted a 35-person user study with the Behavior Tree-based CoSTAR system to determine which strategies for end user creation of generalizable robot task plans are most usable and effctive. CoSTAR allows domain experts to author complex, perceptually grounded task plans for collaborative robots. As a part of CoSTAR's wide range of capabilities, it allows users to specify SmartMoves: abstract goals such as “pick up component A from the right side of the table. ” Users were asked to perform pick-and-place assembly tasks with either SmartMoves or one of three simpler baseline versions of CoSTAR. Overall, participants found CoSTAR to be highly usable, with an average System Usability Scale score of 73. 4 out of 100. SmartMove also helped users perform tasks faster and more effectively; all SmartMove users completed the first two tasks, while not all users completed the tasks using the other strategies. SmartMove users showed better performance for incorporating perception across all three tasks.

RLDM Conference 2017 Conference Abstract

Combining Neural Networks and Tree Search for Task and Motion Planning in Challenging

Chris Paxton
Vasumathi Raman
Gregory D. Hager

Robots in the real world have to deal with complex scenes involving multiple actors and com- plex, changing environments. In particular, self-driving cars are faced with a uniquely challenging task and motion planning problem that incorporates logical constraints with multiple interacting actors in a scene that includes other cars, pedestrians, and bicyclists. A major challenge in this setting, both for neural network approaches and classical planning, is the need to explore future worlds of a complex and interactive envi- ronment. To this end, we integrate Monte Carlo Tree Search with hierarchical neural net control policies trained on expressive Linear Temporal Logic (LTL) specifications. We propose a methodology that incor- porates deep neural networks to learn low-level control policies as well as high-level “option” policies. We thus investigate the ability of neural networks to learn both LTL constraints and continuous control policies in order to generate task plans. We demonstrate our approach in a simulated autonomous driving setting, where a vehicle must drive down a road shared with multiple other vehicles, avoid collisions, and navigate an intersection, all while obeying given rules of the road.

IROS Conference 2017 Conference Paper

Combining neural networks and tree search for task and motion planning in challenging environments

Chris Paxton 0001
Vasumathi Raman
Gregory D. Hager
Marin Kobilarov

Task and motion planning subject to Linear Temporal Logic (LTL) specifications in complex, dynamic environments requires efficient exploration of many possible future worlds. Model-free reinforcement learning has proven successful in a number of challenging tasks, but shows poor performance on tasks that require long-term planning. In this work, we integrate Monte Carlo Tree Search with hierarchical neural net policies trained on expressive LTL specifications. We use reinforcement learning to find deep neural networks representing both low-level control policies and task-level “option policies” that achieve high-level goals. Our combined architecture generates safe and responsive motion plans that respect the LTL constraints. We demonstrate our approach in a simulated autonomous driving setting, where a vehicle must drive down a road in traffic, avoid collisions, and navigate an intersection, all while obeying rules of the road.

ICRA Conference 2017 Conference Paper

CoSTAR: Instructing collaborative robots with behavior trees and vision

Chris Paxton 0001
Andrew Hundt
Felix Jonathan
Kelleher Guerin
Gregory D. Hager

For collaborative robots to become useful, end users who are not robotics experts must be able to instruct them to perform a variety of tasks. With this goal in mind, we developed a system for end-user creation of robust task plans with a broad range of capabilities. CoSTAR: the Collaborative System for Task Automation and Recognition is our winning entry in the 2016 KUKA Innovation Award competition at the Hannover Messe trade show, which this year focused on Flexible Manufacturing. CoSTAR is unique in how it creates natural abstractions that use perception to represent the world in a way users can both understand and utilize to author capable and robust task plans. Our Behavior Tree-based task editor integrates high-level information from known object segmentation and pose estimation with spatial reasoning and robot actions to create robust task plans. We describe the cross-platform design and implementation of this system on multiple industrial robots and evaluate its suitability for a wide variety of use cases.

IROS Conference 2016 Conference Paper

Do what i want, not what i did: Imitation of skills by planning sequences of actions

Chris Paxton 0001
Felix Jonathan
Marin Kobilarov
Gregory D. Hager

We propose a learning-from-demonstration approach for grounding actions from expert data and an algorithm for using these actions to perform a task in new environments. Our approach is based on an application of sampling-based motion planning to search through the tree of discrete, high-level actions constructed from a symbolic representation of a task. Recursive sampling-based planning is used to explore the space of possible continuous-space instantiations of these actions. We demonstrate the utility of our approach with a magnetic structure assembly task, showing that the robot can intelligently select a sequence of actions in different parts of the workspace and in the presence of obstacles. This approach can better adapt to new environments by selecting the correct high-level actions for the particular environment while taking human preferences into account.

ICRA Conference 2016 Conference Paper

Hierarchical semantic parsing for object pose estimation in densely cluttered scenes

Chi Li
Jonathan Bohren
Eric Carlson
Gregory D. Hager

Densely cluttered scenes are composed of multiple objects which are in close contact and heavily occlude each other. Few existing 3D object recognition systems are capable of accurately predicting object poses in such scenarios. This is mainly due to the presence of objects with textureless surfaces, similar appearances and the difficulty of object instance segmentation. In this paper, we present a hierarchical semantic segmentation algorithm which partitions a densely cluttered scene into different object regions. A RANSAC-based registration method is subsequently applied to estimate 6-DoF object poses within each object class. Part of this algorithm includes a generalized pooling scheme used to construct robust and discriminative object representations from a convolutional architecture with multiple pooling domains. We also provide a new RGB-D dataset which serves as a benchmark for object pose estimation in densely cluttered scenes. This dataset contains five thousand scene frames and over twenty thousand labeled poses of ten common hand tools. We show that our method demonstrates improved performance of pose estimation on this new dataset compared with other state-of-the-art methods.

IROS Conference 2016 Conference Paper

Incremental scene understanding on dense SLAM

Chi Li
Han Xiao
Keisuke Tateno
Federico Tombari
Nassir Navab
Gregory D. Hager

We present an architecture for online, incremental scene modeling which combines a SLAM-based scene understanding framework with semantic segmentation and object pose estimation. The core of this approach comprises a probabilistic inference scheme that predicts semantic labels for object hypotheses at each new frame. From these hypotheses, recognized scene structures are incrementally constructed and tracked. Semantic labels are inferred using a multi-domain convolutional architecture which operates on the image time series and which enables efficient propagation of features as well as robust model registration. To evaluate this architecture, we introduce a large-scale RGB-D dataset JHUSEQ-25 as a new benchmark for the sequence-based scene understanding in complex and densely cluttered scenes. This dataset contains 25 RGB-D video sequences with 100, 000 labeled frames in total. We validate our method on this dataset and demonstrate improved performance of semantic segmentation and 6-DoF object pose estimation compared with methods based on the single view.

ICRA Conference 2016 Conference Paper

Learning convolutional action primitives for fine-grained action recognition

Colin Lea
René Vidal
Gregory D. Hager

Fine-grained action recognition is important for many applications of human-robot interaction, automated skill assessment, and surveillance. The goal is to segment and classify all actions occurring in a time series sequence. While recent recognition methods have shown strong performance in robotics applications, they often require hand-crafted features, use large amounts of domain knowledge, or employ overly simplistic representations of how objects change throughout an action. In this paper we present the Latent Convolutional Skip Chain Conditional Random Field (LC-SC-CRF). This time series model learns a set of interpretable and composable action primitives from sensor data. We apply our model to cooking tasks using accelerometer data from the University of Dundee 50 Salads dataset and to robotic surgery training tasks using robot kinematic data from the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our performance on 50 Salads and JIGSAWS are 18. 0% and 5. 3% higher than the state of the art, respectively. This model performs well without requiring hand-crafted features or intricate domain knowledge. The code and features have been made public.

IROS Conference 2016 Conference Paper

Sensor substitution for video-based action recognition

Christian Rupprecht 0001
Colin Lea
Federico Tombari
Nassir Navab
Gregory D. Hager

There are many applications where domain-specific sensing, such as accelerometers, kinematics, or force sensing, provide unique and important information for control or for analysis of motion. However, it is not always the case that these sensors can be deployed or accessed beyond laboratory environments. For example, it is possible to instrument humans or robots to measure motion in the laboratory in ways that it is not possible to replicate in the wild. An alternative, which we explore in this paper, is to address situations where accurate sensing is available while training an algorithm, but for which only video is available for deployment. We present two examples of this sensory substitution methodology. The first variation trains a convolutional neural network to regress real-valued signals, including robot end-effector pose, from video. The second example regresses binary signals derived from accelerometer data which signifies when specific objects are in motion. We evaluate these on the JIGSAWS dataset for robotic surgery training assessment and the 50 Salads dataset for modeling complex structured cooking tasks. We evaluate the trained models for video-based action recognition and show that the trained models provide information that is comparable to the sensory signals they replace.

ICRA Conference 2016 Conference Paper

Unsupervised surgical data alignment with application to automatic activity annotation

Yixin Gao
S. Swaroop Vedula
Gyusung I. Lee
Mija R. Lee
Sanjeev Khudanpur
Gregory D. Hager

Robotic surgery and other minimally-invasive surgical techniques are an integral part of patient care, and readily yield large amounts of data. Surgical tool motion (kinematic data) contains information that is useful for assessment and education. Typically, assessment and education tools that rely upon the kinematic data require substantial manual processing such as activity annotations. The goal of this paper was to develop an automated method to align surgical recordings and assign activity annotations. We developed an approach based on unsupervised alignment to efficient annotate kinematic data for its constituent activity segments. Our method includes extracting non-linear features from the kinematic data using a stacked de-noising autoencoder, and using modified dynamic time warping to align the kinematic data from different trials of the study task. We combined alignment between a test and one or a small set of template trials (with prior manual annotations) with voting based on kernel density estimation to transfer labels from the template to the test trial. Our experiments on performance of this method using two datasets captured in the training laboratory demonstrate an accuracy of 72% to 94% for annotating activity segments within a surgical training task. Our findings are robust to data captured from several surgeons, and to deviations in activity from a canonical activity sequence.

ICRA Conference 2015 Conference Paper

A framework for end-user instruction of a robot assistant for manufacturing

Kelleher Guerin
Colin Lea
Chris Paxton 0001
Gregory D. Hager

Small Manufacturing Entities (SMEs) have not incorporated robotic automation as readily as large companies due to rapidly changing product lines, complex and dexterous tasks, and the high cost of start-up. While recent low-cost robots such as the Universal Robots UR5 and Rethink Robotics Baxter are more economical and feature improved programming interfaces, based on our discussions with manufacturers further incorporation of robots into the manufacturing work flow is limited by the ability of these systems to generalize across tasks and handle environmental variation. Our goal is to create a system designed for small manufacturers that contains a set of capabilities useful for a wide range of tasks, is both powerful and easy to use, allows for perceptually grounded actions, and is able to accumulate, abstract, and reuse plans that have been taught. We present an extension to Behavior Trees that allows for representing the system capabilities of a robot as a set of generalizable operations that are exposed to an end-user for creating task plans. We implement this framework in CoSTAR, the Collaborative System for Task Automation and Recognition, and demonstrate its effectiveness with two case studies. We first perform a complex tool-based object manipulation task in a laboratory setting. We then show the deployment of our system in an SME where we automate a machine tending task that was not possible with current off the shelf robots.

ICRA Conference 2015 Conference Paper

An incremental approach to learning generalizable robot tasks from human demonstration

Amir M. Ghalamzan E.
Chris Paxton 0001
Gregory D. Hager
Luca Bascetta

Dynamic Movement Primitives (DMPs) are a common method for learning a control policy for a task from demonstration. This control policy consists of differential equations that can create a smooth trajectory to a new goal point. However, DMPs only have a limited ability to generalize the demonstration to new environments and solve problems such as obstacle avoidance. Moreover, standard DMP learning does not cope with the noise inherent to human demonstrations. Here, we propose an approach for robot learning from demonstration that can generalize noisy task demonstrations to a new goal point and to an environment with obstacles. This strategy for robot learning from demonstration results in a control policy that incorporates different types of learning from demonstration, which correspond to different types of observational learning as outlined in developmental psychology.

IROS Conference 2014 Conference Paper

Adjutant: A framework for flexible human-machine collaborative systems

Kelleher Guerin
Sebastian Riedel 0002
Jonathan Bohren
Gregory D. Hager

Flexible interaction and instruction is a key enabling technology for expanding robotics into small to medium scale manufacturing, in-home assistance for physically disabled individuals, and robotic surgery. In these cases, performing a task manually is neither practical nor scalable, yet complete automation is cost-prohibitive or impossible. Thus, our interest is in collaborative systems that can be easily trained to work with an operator. This collaborative robotic system should be instructable in a generalizable way for a wide range of tasks, and should generalize to new tasks gracefully with minimal retraining. At the same time, for a given task, the system should take advantage of user interaction modalities needed to accomplish the task, subject to the constraints of the available interfaces. These ideas motivate the Adjutant framework. Adjutant supports human-robot collaborative operations for ranges of user roles and robot capability. Adjutant models human-robot systems via sets of robot capabilities, composable high-level functions that can be specialized to specific tasks, and collaborative behaviors which relate these capabilities to specific user interfaces or interaction paradigms. Adjutant also incorporates several methods encapsulating reusable task information into capabilites, thus specializing them, including tool affordances, perceptual grounding templates, and tool movement primitives. We have implemented Adjutant as a software framework in ROS and, in this paper, explore the utility of Adjutant for performing several real-world collaborative manufacturing tasks on an industrial robot test-bed.

ICRA Conference 2013 Conference Paper

A pilot study in vision-based augmented telemanipulation for remote assembly over high-latency networks

Jonathan Bohren
Chavdar Papazov
Darius Burschka
Kai Krieger
Sven Parusel
Sami Haddadin
William L. Shepherdson
Gregory D. Hager

In this paper we present an approach to extending the capabilities of telemanipulation systems by intelligently augmenting a human operator's motion commands based on quantitative three-dimensional scene perception at the remote telemanipulation site. This framework is the first prototype of the Augmented Shared-Control for Efficient, Natural Telemanipulation (ASCENT) System. ASCENT aims to enable new robotic applications in environments where task complexity precludes autonomous execution or where low-bandwidth and/or high-latency communication channels exist between the nearest human operator and the application site. These constraints can constrain the domain of telemanipulation to simple or static environments, reduce the effectiveness of telemanipulation, and even preclude remote intervention entirely. ASCENT is a semi-autonomous framework that increases the speed and accuracy of a human operator's actions via seamless transitions between one-to-one teleoperation and autonomous interventions. We report the promising results of a pilot study validating ASCENT in a transatlantic telemanipulation experiment between The Johns Hopkins University in Baltimore, MD, USA and the German Aerospace Center (DLR) in Oberpfaffenhofen, Germany. In these experiments, we observed average telemetry delays of 200ms, and average video delays of 2s with peaks of up to 6s for all data. We also observed 75% frame loss for video streams due to bandwidth limits, giving 4fps video.

ICRA Conference 2012 Conference Paper

Sequential scene parsing using range and intensity information

Manuel Brucker
Simon Léonard
Tim Bodenmüller
Gregory D. Hager

This paper describes an extension of the sequential scene analysis system presented by Hager and Wegbreit [12]. In contrast to the original system, which was limited to scenes consisting of geometric primitives, such as spheres, cuboids, and cylinders computed from range data, the extended system is capable of dealing with arbitrarily shaped objects computed from range and intensity images. An object model composed of a triangulated geometry and intensity-based SURF features is introduced. The integration of prior object models into the sequential scene parsing framework is described. The extended system is evaluated with respect to pose estimation and its ability to handle complex scene sequences. It is shown that the new object models enable accurate pose estimation and reliable recognition even in highly cluttered scenes.

IROS Conference 2011 Conference Paper

3D thread tracking for robotic assistance in tele-surgery

Nicolas Padoy
Gregory D. Hager

Remote tele-manipulation tasks can be both long and exhausting. The operative workload can however be reduced through contextual systems, in which routine or dexterous actions are performed automatically. In this paper, we investigate this idea in tele-surgery by proposing automatic scissors, namely the possibility for a surgeon to invoke a third robotic arm to come and automatically cut the thread that he/she is holding. In particular, we address the problem of tracking deformable 3-dimensional (3D) curvilinear objects from stereo images. We propose an approach based on discrete Markov random field (MRF) optimization to track, in 3D, a thread modeled by a non-uniform rational B-spline (NURBS). We evaluate its accuracy off-line on synthetic and real data and illustrate its use for an automatic scissors command within an assistance system based on the da Vinci tele-surgical robot.

ICRA Conference 2011 Conference Paper

Handheld micromanipulation with vision-based virtual fixtures

Brian C. Becker
Robert A. MacLachlan
Gregory D. Hager
Cameron N. Riviere

Precise movement during micromanipulation becomes difficult in submillimeter workspaces, largely due to the destabilizing influence of tremor. Robotic aid combined with filtering techniques that suppress tremor frequency bands increases performance; however, if knowledge of the operator's goals is available, virtual fixtures have been shown to greatly improve micromanipulator precision. In this paper, we derive a control law for position-based virtual fixtures within the framework of an active handheld micromanipulator, where the fixtures are generated in real-time from microscope video. Additionally, we develop motion scaling behavior centered on virtual fixtures as a simple and direct extension to our formulation. We demonstrate that hard and soft (motion scaled) virtual fixtures outperform state-of-the-art tremor cancellation performance on a set of artificial but medically relevant tasks: holding, move-and-hold, curve tracing, and volume restriction.

ICRA Conference 2011 Conference Paper

Human-Machine Collaborative surgery using learned models

Nicolas Padoy
Gregory D. Hager

In the future of surgery, tele-operated robotic assistants will offer the possibility of performing certain commonly occurring tasks autonomously. Using a natural division of tasks into subtasks, we propose a novel surgical Human-Machine Collaborative (HMC) system in which portions of a surgical task are performed autonomously under complete surgeon's control, and other portions manually. Our system automatically identifies the completion of a manual subtask, seamlessly executes the next automated task, and then returns control back to the surgeon. Our approach is based on learning from demonstration. It uses Hidden Markov Models for the recognition of task completion and temporal curve averaging for learning the executed motions. We demonstrate our approach using a da Vinci tele-surgical robot. We show on two illustrative tasks where such human-machine collaboration is intuitive that automated control improves the usage of the master manipulator workspace. Because such a system does not limit the traditional use of the robot, but merely enhances its capabilities while leaving full control to the surgeon, it provides a safe and acceptable solution for surgical performance enhancement.

ICRA Conference 2011 Conference Paper

Object mapping, recognition, and localization from tactile geometry

Zachary A. Pezzementi
Caitlin Reyda
Gregory D. Hager

We present a method for performing object recognition using multiple images acquired from a tactile sensor. The method relies on using the tactile sensor as an imaging device, and builds an object representation based on mosaics of tactile measurements. We then describe an algorithm that is able to recognize an object using a small number of tactile sensor readings. Our approach makes extensive use of sequential state estimation techniques from the mobile robotics literature, whereby we view the object recognition problem as one of estimating a consistent location within a set of object maps. We examine and test approaches based on both traditional particle filtering and histogram filtering. We demonstrate both the mapping and recognition / localization techniques on a set of raised letter shapes using real tactile sensor data.

ICRA Conference 2011 Conference Paper

Towards integrating task information in skills assessment for dexterous tasks in surgery and simulation

Amod Jog
Brandon Itkowitz
May Liu
Simon P. DiMaio
Gregory D. Hager
Myriam Curet
Rajesh Kumar 0001

With the increasing popularity of robotic surgery, several studies in the literature have investigated automatically assessing skill measures based on motion and video data captured from these systems. A range of simulation environments for robotic surgery are now in development. Skill assessment in these environments has so far only focused on evaluating the utility and validity of statistics such as task completion time, and instrument distance measured during a simulated task. We present the first work using motion data from a robotic surgery simulation environment in development for classifying users of varying skills and detecting completion of trainee. Given the standardized environment of the simulator, and the availability of the ground truth, skill measurements and feedback based on task motion hold the promise of effective automated objective assessment. Based on motion data of a simulated manipulation task from 17 users of varying skills, we demonstrate binary classification (proficient vs. trainee) of user skill with 87. 5% accuracy. Alternate measures based on instrument pose more relevant in the simulated environment including a new measure of motion efficiency are also presented and evaluated.

IROS Conference 2011 Conference Paper

Towards validation of robotic surgery training assessment across training platforms

Yixin Gao
Mert Sedef
Amod Jog
Peter Peng
Michael A. Choti
Gregory D. Hager
Jeff Berkley
Rajesh Kumar 0001

Robotic surgery is increasingly popular for a wide range of complex minimally invasive surgery procedures. To improve robotic surgery training, a skills trainer simulator called dV-Trainer has recently been introduced, and a da Vinci Skills Simulator is in advanced evaluation. These platforms report a range of time and motion based task metrics and literature has investigated the validity of these metrics in training studies. However, the lack of a cross-platform data collection system has so far prevented a cross-platform investigation. Using a new architecture for collecting cross-platform motion data, we present the first study investigating whether metrics previously validated in simulation environments also hold in training exercises with a real robotic system. Preliminary experiments for an anastomosis needle throwing task in both simulated and real robotic environments are presented, and corresponding performance metrics for both proficient and trainee users are reported.

IROS Conference 2011 Conference Paper

Visual tracking using the sum of conditional variance

Rogério Richa
Raphael Sznitman
Russell H. Taylor
Gregory D. Hager

The goal of this paper is to introduce a direct visual tracking method based on an image similarity measure called the sum of conditional variance (SCV). The SCV was originally proposed in the medical imaging domain for registering multi-modal images. In the context of visual tracking, the SCV is invariant to non-linear illumination variations, multi-modal and computationally inexpensive. Compared to information theoretic tracking methods, it requires less iterations to converge and has a significantly larger convergence radius. The novelty in this paper is a generalization of the efficient second-order minimization formulation for tracking using the SCV, allowing us to combine the efficient second-order approximation of the Hessian with a similarity metric invariant to non-linear illumination variations. The result is a visual tracking method that copes with non-linear illumination variations without requiring the estimation of photometric correction parameters at every iteration. We demonstrate the superior performance of the proposed method through comparative studies and tracking experiments under challenging illumination conditions and rapid motions.

ICRA Conference 2010 Conference Paper

Sampling-Based Motion and Symbolic Action Planning with geometric and differential constraints

Erion Plaku
Gregory D. Hager

To compute collision-free and dynamically-feasibile trajectories that satisfy high-level specifications given in a planning-domain definition language, this paper proposes to combine sampling-based motion planning with symbolic action planning. The proposed approach, Sampling-based Motion and Symbolic Action Planner (SMAP), leverages from sampling-based motion planning the underlying idea of searching for a solution trajectory by selectively sampling and exploring the continuous space of collision-free and dynamically-feasible motions. Drawing from AI, SMAP uses symbolic action planning to identify actions and regions of the continuous space that sampling-based motion planning can further explore to significantly advance the search. The planning layers interact with each-other through estimates on the utility of each action, which are computed based on information gathered during the search. Simulation experiments with dynamical models of vehicles carrying out tasks given by high-level STRIPS specifications provide promising initial validation, showing that SMAP efficiently solves challenging problems.

ICRA Conference 2009 Conference Paper

Active guidance of a handheld micromanipulator using visual servoing

Brian C. Becker
Sandrine Voros
Robert A. MacLachlan
Gregory D. Hager
Cameron N. Riviere

In microsurgery, a surgeon often deals with anatomical structures of sizes that are close to the limit of the human hand accuracy. Robotic assistants can help to push beyond the current state of practice by integrating imaging and robot-assisted tools. This paper demonstrates control of a handheld tremor reduction micromanipulator with visual servo techniques, aiding the operator by providing three behaviors: snap-to, motion-scaling, and standoff-regulation. A stereo camera setup viewing the workspace under high magnification tracks the tip of the micromanipulator and the desired target object being manipulated. Individual behaviors activate in task-specific situations when the micromanipulator tip is in the vicinity of the target. We show that the snap-to behavior can reach and maintain a position at a target with an accuracy of 17. 5 ± 0. 4µm Root Mean Squared Error (RMSE) distance between the tip and target. Scaling the operator's motions and preventing unwanted contact with non-target objects also provides a larger margin of safety.

ICRA Conference 2009 Conference Paper

Analysis of Crohn's disease lesions in capsule endoscopy images

Srdan Bejakovic
Rajesh Kumar 0001
Themistocles Dassopoulos
Gerard Mullin
Gregory D. Hager

Capsule endoscopy (CE) is aimed at diagnosing disease in areas of the gastrointestinal (GI) tract beyond the reach of conventional endoscopy. Recent work has addressed various methods for reducing the complexity of CE diagnosis and the time needed for analyzing the data. This includes detection of lumen and its contractions, fluids such as blood and intestinal juices, as well as extraneous matter such as food and bubbles. This paper outlines our ongoing work to segment lesions (in particular Crohn's disease) and other abnormalities in CE images. In particular, here we describe the data collection and clinical analysis for our project and preliminary results for segmenting abnormal and extraneous images from a set of 10 CE studies.

ICRA Conference 2009 Conference Paper

Articulated object tracking by rendering consistent appearance parts

Zachary A. Pezzementi
Sandrine Voros
Gregory D. Hager

We describe a general methodology for tracking 3-dimensional objects in monocular and stereo video that makes use of GPU-accelerated filtering and rendering in combination with machine learning techniques. The method operates on targets consisting of kinematic chains with known geometry. The tracked target is divided into one or more areas of consistent appearance. The appearance of each area is represented by a classifier trained to assign a class-conditional probability to image feature vectors. A search is then performed on the configuration space of the target to find the maximum likelihood configuration. In the search, candidate hypotheses are evaluated by rendering a 3D model of the target object and measuring its consistency with the class probability map. The method is demonstrated for tool tracking on videos from two surgical domains, as well as in a human hand-tracking task.

IROS Conference 2008 Conference Paper

Control methods for guidance virtual fixtures in compliant human-machine interfaces

Panadda Marayong
Gregory D. Hager
Allison M. Okamura

This work focuses on the implementation of a vision-based motion guidance method, called virtual fixtures, on admittance-controlled human-machine cooperative robots with compliance. The robot compliance here refers to the structural elastic deformation of the device. The high mechanical stiffness and non-backdrivability of a typical admittance-controlled robot allow for slow and precise motions, making it highly suitable for tasks that require accuracy near human physical limits, such as microsurgery. However, previous experiments have shown that even small robot compliance degraded virtual fixture performance, especially at the micro scale. In this work, control methods to minimize the effect of robot compliance on virtual fixture performance were developed for admittance-controlled cooperative systems. Based on a linear model of the robot dynamics, we applied a Kalman filter to integrate the measurements obtained from the camera and encoders to estimate the robot end-effector position. A partitioned control law was used to achieve end-effector trajectory following on the desired velocity commanded by the admittance and virtual fixture control laws. The effectiveness of the Kalman filter and the controller was validated on a one degree-of-freedom admittance-controlled cooperative testbed.

ICRA Conference 2007 Conference Paper

Development and Application of a New Steady-Hand Manipulator for Retinal Surgery

Ben Mitchell
John Koo
Iulian I. Iordachita
Peter Kazanzides
Ankur Kapoor
James Handa
Gregory D. Hager
Russell H. Taylor

This paper describes the development and initial testing of a new and optimized version of a steady-hand manipulator for retinal microsurgery. In the steady-hand paradigm, the surgeon and the robot share control of a tool attached to the robot through a force sensor. The robot controller senses forces exerted by the operator on the tool and uses this information in various control modes to provide smooth, tremor-free, precise positional control and force scaling. The steady-hand manipulator reported here has been specifically designed with the unique constraints of retinal microsurgery in mind. In particular, the system makes use of a compact wrist design that places the bulk of the robot away from the operating field. The resulting system has high efficacy, flexibility and ergonomics while meeting the accuracy and safety requirements of microsurgery. We have now tested this robot on a biological model system and we report a protocol for reliably cannulating ~80 μm OD veins (the size of veins in the human retina) using the system.

ICRA Conference 2007 Conference Paper

Dynamic Guidance with Pseudoadmittance Virtual Fixtures

Zachary A. Pezzementi
Allison M. Okamura
Gregory D. Hager

Human machine collaborative systems (HMCS) have been developed to enhance sensation and suppress extraneous motions or forces during surgical tasks requiring precise motion. However, to date such systems have enforced constraints on the position or path of a tool, but have not considered the dynamics of motion. Also, the focus has been on the effect of guidance of motion during a task, rather than on the learning of motion skills through repetition. We present a pseudo-admittance framework for HMCS design to guide the user's velocity in such tasks. Two different fixture design approaches are analyzed, implemented and compared. Three tests are then conducted, showing the fixtures' promise for both guiding and learning motions with dynamics

ICRA Conference 2007 Conference Paper

Full Motion Tracking in Ultrasound Using Image Speckle Information and Visual Servoing

Alexandre Krupa
Gabor Fichtinger
Gregory D. Hager

This paper presents a new visual servoing method that is able to stabilize a moving area of soft tissue within an ultrasound B-mode imaging plane. The approach consists of moving the probe in order to minimize the relative position between a target imaging plane and the ultrasound plane observed by the probe of the moving tissue target. The problem is decoupled into motion out-of-plane and motion within plane. For the former, a new original method based on the speckle information contained in the images is developed. For the latter, an image region tracker is used to provide the in-plane motion. A visual servoing control scheme is then developed to perform the tracking robotic task. The method is validated on simulated motions of a probe on a static ultrasound volume acquired from a phantom.

IROS Conference 2007 Conference Paper

Kernel-based visual servoing

Vinutha Kallem
Maneesh Dewan
John P. Swensen
Gregory D. Hager
Noah J. Cowan

Traditionally, visual servoing is separated into tracking and control subsystems. This separation, though convenient, is not necessarily well justified. When tracking and control strategies are designed independently, it is not clear how to optimize them to achieve a certain task. In this work, we propose a framework in which spatial sampling kernels - borrowed from the tracking and registration literature - are used to design feedback controllers for visual servoing. The use of spatial sampling kernels provides natural hooks for Lyapunov theory, thus unifying tracking and control and providing a framework for optimizing a particular servoing task. As a first step, we develop kernel-based visual servos for a subset of relative motions between camera and target scene. The subset of motions we consider are 2D translation, scale, and roll of the target relative to the camera. Our approach provides formal guarantees on the convergence/stability of visual servoing algorithms under putatively generic conditions.

ICRA Conference 2006 Conference Paper

Portability and Applicability of Virtual Fixtures across Medical and Manufacturing Tasks

Henry C. Lin 0001
Keith Mills
Peter Kazanzides
Gregory D. Hager
Panadda Marayong
Allison M. Okamura
Ray Karam

Virtual fixtures are virtual constraints that enhance human performance in motion tasks. They can either confine and/or guide a user's motion. In this paper, we use a commercially available motion platform to explore the portability and applicability of virtual fixtures and document how people interact with them. Two micromanipulation tasks are analyzed and the effects of similarly designed virtual fixtures are discussed. One task simulates a medical task, retinal vein cannulation, and the other simulates a manufacturing task, fine leads soldering. Preliminary experimental results show that the virtual fixtures increase the accuracy of both medical and manufacturing tasks, lending support to its portability and applicability across unrelated tasks

ICRA Conference 2005 Conference Paper

Vision-Based 3D Scene Analysis for Driver Assistance

Darius Burschka
Gregory D. Hager

We present a vision-based system for traffic sign detection and ego-motion estimation in road scenarios. The system is capable of autonomous scene reconstruction and classification. It is used to pre-select candidate surfaces in the vicinity of the road that should be inspected more closely by a sign recognition system. We compare two approaches based on a binocular and a monocular camera system, respectively. We discuss their advantages and disadvantages for applications in driver assistance systems.

IROS Conference 2004 Conference Paper

Scale-invariant registration of monocular stereo images to 3D surface models

Darius Burschka
Ming Li 0052
Russell H. Taylor
Gregory D. Hager

We present an approach for scale recovery from monocular stereo images of an endoscopic camera with simultaneous registration to dense 3D surface models. We assume the camera motion to be unknown or at least uncertain. An example application is the registration of endoscope images to pre-operative CT scans that allows instrument navigation during surgical procedures. The application field is not restricted to the medical field. It can be extended to registration of monocular video images to laser-based surface reconstructions in, e. g. , mobile navigation area or to autonomous aircraft navigation from topological surveys. A novel way for depth estimation from arbitrary camera motion is presented. In this paper, we focus on the robust initialization of the system and on the scale recovery for the reconstructed 3D point clouds with accurate registration to the candidate surfaces extracted from the CT data. We provide experimental validation of the algorithm with data obtained from our experiments with a phantom skull.

ICRA Conference 2004 Conference Paper

V-GPS(SLAM): Vision-based Inertial System for Mobile Robots

Darius Burschka
Gregory D. Hager

We present a novel vision-based approach to simultaneous localization and mapping (SLAM). We discuss it in the context of estimating the 6 DoF pose of a mobile robot from the perception of a monocular camera using a minimum set of three natural landmarks. In contrast to our previously presented V-GPS system, which navigates based on a set of known landmarks, the current approach allows to estimate the required information about the landmarks on-the-fly during the exploration of an unknown environment The method is applicable to indoor and outdoor environments. The calculation is done from the image position of a set of natural landmarks that are tracked in a continuous video stream at frame-rate. An automatic hand-off process allows an update of the set to compensate for occlusions and decreasing reconstruction accuracies with the distance to an imaged landmark. A generic sensor model allows a system configuration with a variety of physical sensors including: monocular perspective cameras, omni-directional cameras and laser range finders.

ICRA Conference 2003 Conference Paper

Direct plane tracking in stereo images for mobile navigation

Jason J. Corso
Darius Burschka
Gregory D. Hager

We present a novel plane tracking algorithm based on the direct update of surface parameters from two stereo images. The plane tracking algorithm is posed as an optimization problem, and maintains an iteratively re-weighted least squares approximation of the plane's orientation using direct pixel measurements. To facilitate autonomous operation, we include an algorithm for robust detection of significant planes in the environment. The algorithms have been implemented in a robot navigation system.

ICRA Conference 2003 Conference Paper

Functional reactive programming as a hybrid system framework

Izzet Pembeci
Gregory D. Hager

In previous work we presented functional reactive programming (FRP), a general framework for designing hybrid systems and developing domain-specific languages for related domains. FRP's synchronous dataflow features, like event driven switching, supported by higher-order lazy functional abstractions of Haskell allows rapid development of modular and reusable specifications. In this paper, we look at more closely to the relation of arrowized FRP (AFRP), the FRP implementation, and formal specification of hybrid systems. We show how a formally specified hybrid system can be expressed in FRP and present a constructive proof showing that, for a subset of AFRP programs, there is a corresponding formal hybrid system specification.

IROS Conference 2003 Conference Paper

Handling discontinuities in stereovisual alignment tasks

Zachary Dodds
Gregory D. Hager

For more than a decade the capabilities of incompletely calibrated stereo systems have been formally characterized. These characterizations provide theoretical limits on the tasks that visually guided robotic systems can achieve. Within these limits there are several feature-alignment tasks whose image-space representations are inherently discontinuous, including collinearity and coplanarity. This paper presents smooth, locally convergent image-space approximations for such tasks. In addition, we propose the general notion of a "safe" image-based approximation to spatial tasks, in which complete image-space characterization is sacrificed in return for partial characterization along with local controllability. As a result, this article bridges the gap between theoretically specifiable tasks and locally controllable tasks for common stereovisual servoing systems.

ICRA Conference 2003 Conference Paper

Optimal landmark configuration for vision-based control of mobile robots

Darius Burschka
Jeremy Geiman
Gregory D. Hager

We analyze the problem of finding the optimal placement of tracked primitives for robust vision-based control of a mobile robot. The analysis evaluates the properties of the Image Jacobian matrix, used for direct generation of the control signals from the error signal in the image, and the accuracy of the underlying sensor system. The analysis is then used to select optimal tracking primitives that ensure good observability and controllability of the mobile system for a variety of sensor system configurations. The theoretical results are validated with our mobile robot for system configurations that use standard video cameras mounted on a pan-tilt head and catadioptric systems.

ICRA Conference 2003 Conference Paper

Spatial motion constraints: theory and demonstrations for robot guidance using virtual fixtures

Panadda Marayong
Ming Li 0052
Allison M. Okamura
Gregory D. Hager

In this article, we describe and demonstrate control algorithms for general motion constraints. These constraints are designed to enhance the accuracy and speed of a user manipulating in an environment with the assistance of a cooperative or telerobotic system. Our method uses a basis of preferred directions, created off-line or in real-time using sensor data, to generate virtual fixtures that may constrain the user to a curve, surface, orientation, etc. in space. Open loop virtual fixtures seek only to maintain user motion along preferred directions, whereas closed loop fixtures additionally guide the user toward a point, line, or surface. This article demonstrates and compares the effects of open and closed loop fixtures in both autonomous and human-machine cases.

IROS Conference 2003 Conference Paper

Task modeling and specification for modular sensory based human-machine cooperative systems

Danica Kragic
Gregory D. Hager

This paper is directed towards developing human-machine cooperative systems (HCMS) for augmented surgical manipulation tasks. These tasks are commonly repetitive, sequential, and consist of simple steps. The transitions between these steps can be driven either by the surgeon's input or sensory information. Consequently, complex tasks can be effectively modeled using a set of basic primitives, where each primitive defines some basic type of motion (e. g. translational motion along a line, rotation about an axis, etc.). These steps can be "open-loop" (simply complying to user's demands) or "closed-loop, in which case external sensing is used to define a nominal reference trajectory. The particular research problem considered here is the development of a system that supports simple design of complex surgical procedures from a set of basic control primitives. The three system levels considered are: i) task graph generation which allows the user to easily design or model a task, ii) task graph execution which executes the task graph, and iii) at the lowest level, the specification of primitives which allows the user to easily specify new types of primitive motions. The system has been developed and validated using the JHU Steady Hand Robot as an experimental platform.

IROS Conference 2003 Conference Paper

V-GPS - image-based control for 3D guidance systems

Darius Burschka
Gregory D. Hager

We present our approach for pose verification with monocular cameras in 3-dimensional space based on the image-based control paradigm. We describe the extensions to our previous control system for mobile navigation that allow us to estimate the complete set of those parameters in space. The major contribution of this approach is a sensor-independent formulation that allows a flexible configuration with a variety of sensor systems including standard cameras, omnidirectional cameras and laser systems. Our second contribution is a way to re-initialize the tracked landmarks during a multi-segment navigation in applications with significant derivations from the pre-taught trajectory as it is the case for handheld systems and flying robots. The presented system can be used as a guidance system for visitors. The localization is based on known landmarks that are in our case natural landmarks in the environment. These landmarks correspond to the satellites of a GPS system. We call it V-GPS (vision-based GPS) because of this similarity in the concept. A camera carried by a person allows to navigate along pre-specified paths through environments, like galleries, hospitals, parks, and other public places.

ICRA Conference 2002 Conference Paper

Specifying Behavior in C++

Xiangtian Dai
Gregory D. Hager
John Peterson

Most robot programming takes place in the "time domain", that is, the goal is to specify the behavior of a system that is acquiring a continual temporal stream of inputs, and is required to provide a continual, temporal stream of outputs. We present a reactive programming language, based on the functional reactive programming paradigm, for specifying such behavior. The major attributes of this language are: 1) it provides for both synchronous and asynchronous definitions of behavior; 2) specification is equational in nature; 3) it is type safe; and 4) it is embedded in C++. In particular the latter makes it simple to "lift" existing C++ libraries into the language.

ICRA Conference 2002 Conference Paper

Stereo-Based Obstacle Avoidance in Indoor Environments with Active Sensor Re-Calibration

Darius Burschka
Stephen Lee
Gregory D. Hager

We present a stereo-based obstacle avoidance system for mobile vehicles. The system operates in three steps. First, it models the surface geometry of the supporting surface and removes the supporting surface from the scene. Next, it segments the remaining stereo disparities into connected components in image and disparity space. Finally, it projects the resulting connected components onto the supporting surface and plans a path around them. One interesting aspect of this system is that it can detect both positive and "negative" obstacles (e. g. stairways) in its path. The algorithms we have developed have been implemented on a mobile robot equipped with a real-time stereo system. We present experimental results on indoor environments with planar supporting surfaces that show the algorithms to be both fast and robust.

ICRA Conference 2002 Conference Paper

Vision Assisted Control for Manipulation using Virtual Fixtures: Experiments at Macro and Micro Scales

Alessandro Bettini
Samuel Lang
Allison M. Okamura
Gregory D. Hager

We present the design and implementation of a vision-based system for micron-scale, cooperative manipulation of a surgical tool. The system is based on a control algorithm that implements a broad class of guidance modes called virtual fixtures. A virtual fixture, like a real fixture, limits the motion of a tool to a prescribed class or range. The implemented system uses vision as a sensor for providing a reference trajectory, and the control algorithm then provides haptic feedback involving direct, shared manipulation of a surgical tool. We have tested this system on the JHU Steady Hand robot and provide experimental results for path following and positioning on structures at both macroscopic and microscopic scales.

IROS Conference 2001 Conference Paper

Vision assisted control for manipulation using virtual fixtures

Alessandro Bettini
Samuel Lang
Allison M. Okamura
Gregory D. Hager

The "steady hand" concept is a way of providing assistance for direct manipulation by applying constraints on the motion of a tool shared by a user and a robot. We explore in detail one family of constraints: virtual fixtures for use in path following tasks. Vision is used to sense the desired path, and then the robot encourages motion toward and along the path through a direction-based control law. This "soft" virtual fixture allows the user to move in other, non-preferred directions, maintaining the user's sense of autonomy and control. Experimental results show that user performance in assisted path following improves with virtual fixture augmentation, and differs with varying fixture compliance.

ICRA Conference 2001 Conference Paper

Vision Based Control of Mobile Robots

Darius Burschka
Gregory D. Hager

This paper presents an approach for direct control of a mobile robot to keep it on a pre-taught path based solely on the perception from a monocular CCD camera. In particular, we present a novel vision-based control algorithm for mobile systems equipped with a conventional camera and a pan-tilt head or with an omnidirection camera. This algorithm avoids numerical instabilities of previously reported approaches. The experimental performance of the method as well as its practical limitations are discussed.

ICRA Conference 2000 Conference Paper

On Specifying and Performing Visual Tasks with Qualitative Object Models

Gregory D. Hager
Zachary Dodds

Vision-based control has aimed to develop general-purpose, high accuracy systems for manipulating objects. While much of the scientific and technological infrastructure needed to accomplish this aim is now in place, several stumbling blocks still remain. One continuing issue is accuracy, and its relationship to system calibration. We describe a generative task structure for vision-based control of motion that admits a simple, geometric approach to task specification. At the same time, this approach allows one to state precisely what types of miscalibration lead to errors in task performance. A second hurdle has been the programmability of hand-eye systems. However, we argue that a structured object representation sufficient for flexible hand-eye coordination is a possibility. The result is a high-level, object-centered language for expressing hand-eye tasks.

ICRA Conference 1999 Conference Paper

A Language for Declarative Robotic Programming

John Peterson
Gregory D. Hager
Paul Hudak

We have applied methodologies developed for domain-specific embedded languages to create a high-level robot control language called Frob, for functional robotics. Frob supports a programming style that cleanly separates the what from the how of a robotic control program. That is, the what is a simple, easily understood definition of the control strategy using groups of equations and primitives which combine sets of these control system equations into a complex system. The how aspect of the program addresses the unpleasant details, such as the method used to realize these equations, the connection between the control equations and the sensors and effectors in the robot, and communication with other elements of the system. Frob is a system that supports rapid prototyping of new control strategies, enables software reuse through composition, and defines a system in a way that can be formally reasoned about and transformed.

ICRA Conference 1999 Conference Paper

Fast 3D Boundary Computation from Occluding Contour Motion

Aage Bendiksen
Gregory D. Hager

Presents a fast method for computing a bounding volume within which an observed object must lie from the observed motion of the occluding contour during a straight-line motion of the camera. The bounding volume is represented as a set of planar cross sections each consisting of multiple convex polygons within which the object lies. The algorithm's worst-case runtime performance is O(nmk) operations, where n is the number of viewpoints used, m is the number of polygons created during the execution of the algorithm, and k is a parameter dependent on the geometric complexity of the object being viewed. Experimental examples are demonstrated; bounding polygon computation from sampled contours in 20 images required less than one second on a 50 MHz i486 CPU.

ICRA Conference 1999 Conference Paper

Model-Based 3D Object Tracking Using Projective Invariance

Sung-Woo Lee
Bum-Jae You
Gregory D. Hager

This paper describes a method of 3D object tracking by projective invariance. Usually, projective invariance has been used for object recognition and its property discards the need for cumbersome camera calibration to obtain metric information. We focus on the fact the projective invariance is not only regarded as recognition cues but also able to verify, the status of a 3D object for visual tracking. With projective invariance relevant to points, we develop a fast and reliable visual tracking strategy by combining a recognition module using projective invariance, window-based corner trackers for object's center tracking and a visibility checking module. The proposed method is experimented in single IBM-PC platform successfully.

ICRA Conference 1999 Conference Paper

Task Specification and Monitoring for Uncalibrated Hand/Eye Coordination

Zachary Dodds
Gregory D. Hager
A. Stephen Morse
João P. Hespanha 0001

Most of the work in robotic manipulation and visual servoing has emphasized how to specify and perform particular tasks. Recent results have formally shown what tasks are possible with uncalibrated imaging systems. This paper extends those results by characterizing in a constructive manner the set of tasks which can be performed with different types of uncalibrated camera models. The tasks resulting structure provides a principle foundation both for a specification language and for automatic execution monitoring in uncalibrated environments.

IROS Conference 1999 Conference Paper

Three-dimensional pose determination for a humanoid robot using binocular head system

Hong-Jae Kim
Bum-Jae You
Gregory D. Hager
Sang-Rok Oh
Chong-Won Lee

There has been much interests on three-dimensional pose estimation of an object since there are a lot of robotic applications such as intelligent robotic assembly and/or vision-guided control of a humanoid or human-friendly robot. In this paper, there are proposed a simple and fast stereo matching algorithm for real-time robotic applications and an improved scheme for feature matching using three-dimensional information by adopting backprojection as a feedback element measuring the degree of feature matching. The degree of feature matching is determined by checking existence ratio of each feature point in real images after back-projecting an object model into image plane. The proposed pose determination algorithm is applied for a humanoid robot in KIST, whose name is CENTAUR, successfully and determine the pose of several polyhedral objects using 3D information of vertexes on the outline of an object in image plane.

ICRA Conference 1998 Conference Paper

Dynamic Sensor Planning in Visual Servoing

Éric Marchand
Gregory D. Hager

We present an approach to dynamic sensor planning problems in visual servoing. Specifically, one of the main problems an image-based visual servoing is to plan the camera trajectory in order to avoid undesired configurations (e. g. , features out of view, collision with obstacles, etc.). Our approach uses the robot redundancy and employs a control scheme based on the task function approach. It combines the regulation of the selected vision-based task with the minimization of a secondary cost function, which reflects given constraints on the manipulator trajectory. We describe how this methodology is applied to common problems in robotic vision: occlusion avoidance, field of view constraint and obstacle avoidance. We demonstrate the validity of this approach with various experiments.

IROS Conference 1998 Conference Paper

Joint probabilistic techniques for tracking objects using multiple visual cues

Christopher Rasmussen
Gregory D. Hager

Robots relying on vision as a primary sensor frequently need to track common objects such as people, cars, and tools in order to successfully perform autonomous navigation or grasping tasks. These objects may comprise many visual parts and attributes, yet image-based tracking algorithms are often keyed to only one of a target's identifying characteristics. In this paper, we present a framework for sharing information among disparate state estimation processes operating on the same underlying visual object. Well-known techniques for joint probabilistic data association are adapted to yield increased robustness when multiple trackers attuned to different visual cues are deployed simultaneously. We also formulate a measure of tracker confidence, based on distinctiveness and occlusion probability, which permits the deactivation of trackers before erroneous state estimates adversely affect the ensemble. We will discuss experiments using color-region- and snake-based tracking in tandem that demonstrate the efficacy of this approach.

ICRA Conference 1998 Conference Paper

What Can Be Done with an Uncalibrated Stereo System?

João P. Hespanha 0001
Zachary Dodds
Gregory D. Hager
A. Stephen Morse

Over the last several years, there has been an increasing appreciation of the impact of control architecture on the accuracy of visual servoing systems. In particular, it is generally acknowledged that so-called image-based methods provide the highest guarantees of accuracy on inaccurately calibrated hand-eye systems. Less clear is the impact of the control architecture on the set of tasks which the system can perform. In this article, we present a formal analysis of control architectures for hand-eye coordination. Specifically, we first state a formal characterization of what makes a task performable under three possible encoding methods. Then, for the specific case of cameras modeled using projective geometry, we relate this characterization to notions of projective invariance and demonstrate the limits of achievable performance in this regard.

ICRA Conference 1997 Conference Paper

Image-based prediction of landmark features for mobile robot navigation

Gregory D. Hager
David J. Kriegman
Erliang Yeh
Christopher Rasmussen

We have been developing an architecture for vision-based navigation which relies on continuous feedback from visual "landmarks" to control robot motion, In this approach, landmarks are consistently located and acquired as they come into view. To make this process efficient and robust, it is important that the image locations of these features can be predicted from available image information. In this article, we discuss methods for direct image-based prediction of point and line features for a mobile system operating on a planar surface. Preliminary experimental results suggest that image-based prediction con be performed efficiently and with sufficient accuracy to ensure robust acquisition of navigational landmarks.

IROS Conference 1996 Conference Paper

Preliminary results on grasping with vision and touch

Jae S. Son
Robert D. Howe
Jonathan G. Wang
Gregory D. Hager

This paper presents initial results in integrating touch with vision for delicate manipulation tasks. A generalizable framework of behavioral primitives for tactile and visual feedback control is proposed. Since vision provides position and shape information at a distance, while tactile provides small-scale geometric and force information, we focus on the complimentary roles of vision and touch. We demonstrate that visual feedback can perform the rough positioning needed for tactile sensor feedback, and that grasp force and object orientation can be sensed and controlled with tactile sensing. A force sensor based approach provides a comparison measure, and we observe that the use of tactile sensing results in a more gentle grasp.

ICRA Conference 1996 Conference Paper

Servomatic: a modular system for robust positioning using stereo visual servoing

Kentaro Toyama
Gregory D. Hager
Jonathan G. Wang

We introduce Servomatic, a modular system for robot motion control based on calibration-insensitive visual servoing. A small number of generic motion control operations referred to as primitive skills use stereo visual feedback to enforce a specific task-space kinematic constraint between a robot end-effector and a set of target features. Primitive skills are able to position with an accuracy that is independent of errors in hand-eye calibration and are easily combined to form more complex kinematic constraints as required by different applications. The system has been applied to a number of example problems, showing that modular, high precision, vision-based motion control is easily achieved with off-the-shelf hardware. Our continuing goal is to develop a system where low-level robot control ceases to be a concern to higher-level robotics researchers.

IROS Conference 1995 Conference Paper

A "robust" convergent visual servoing system

D. Kim
Alfred A. Rizzi
Gregory D. Hager
Daniel E. Koditschek

This paper describes a simple visual servoing control algorithm capable of robustly positioning a three degree of freedom end effector based only on information from a stereo vision system. The proposed control algorithm does not require estimates of the gripper's spatial position, a significant source of calibration sensitivity. The controller is completely immune to positional camera calibration errors, and we demonstrate robustness to orientation miscalibration through a series of simulations and experiments.

IROS Conference 1995 Conference Paper

Keeping your eye on the ball: tracking occluding contours of unfamiliar objects without distraction

Kentaro Toyama
Gregory D. Hager

Visual tracking is prone to distractions, where features similar to the target features guide the track away from its intended object. Global shape models and dynamic models are necessary for completely distraction-free contour tracking, but there are cases when component feature trackers alone can be expected to avoid distraction. We define the tracking problem in general and devise a method for local, window-based, feature trackers to track accurately in spite of background distractions. The algorithm is applied to a generic line tracker and a snake-like contour tracker which are then analyzed with respect to previous contour-trackers. We discuss the advantages and disadvantages of our approach and suggest that existing model-based trackers can be improved by incorporating similar techniques at the local level.

ICRA Conference 1994 Conference Paper

A Vision-Based Grasping System for Unfamiliar Planar Objects

Aage Bendiksen
Gregory D. Hager

This paper describes a vision based robotic system for grasping unfamiliar planar objects with a parallel jaw gripper. The input to the grasp synthesis system consists of projected edge data from a single camera video system calibrated to a planar working surface. A simple search procedure is used to find an acceptable grasp. Our grasp analysis metric is the required squeezing force, computed using an approximation of the rigid body equilibrium conditions. The resulting analysis conditions are formulated as a linear program and solved using the simplex method. A Zebra-ZERO robot arm with a parallel-jaw gripper is used to execute the chosen grasp. In a simple test of the system, it successfully grasped and lifted 10 of 14 test objects. >

IROS Conference 1994 Conference Paper

Feature-based visual servoing and its application to telerobotics

Gregory D. Hager
Gerhard Grunwald
Gerhard Hirzinger

Advances in visual servoing theory and practice now make it possible to accurately and robustly position a robot manipulator relative to a target. Both the vision and control algorithms are extremely simple, however they must be initialized on task-relevant features in order to be applied. Consequently, they are particularly well-suited to telerobotics systems where an operator can initialize the system but round-trip delay prohibits direct operator feedback during motion. This paper describes the basic theory behind feature-based visual servoing, and discusses the issues involved in integrating visual servoing into the ROTEX space teleoperation system. >

ICRA Conference 1994 Conference Paper

Robot Feedback Control Based on Stereo Vision: Towards Calibration-Free Hand-Eye Coordination

Gregory D. Hager
Wen-Chung Chang
A. Stephen Morse

This article describes the theory and implementation of a system that positions a robot manipulator using visual information from two cameras. The system simultaneously tracks the robot end-effector and visual features used to define goal positions. An error signal based on the visual distance between the end-effector and the target as defined and a control law that moves the robot to drive this error to zero is derived. The control law has been integrated into a system that performs tracking and stereo control on a single processor with no special purpose hardware at real-time rates. Experiments with the system have shown that the controller is so robust to calibration error that the cameras can be moved several centimeters and rotated several degrees while the system is running with no adverse effects. >

ICRA Conference 1992 Conference Paper

Constraint solving methods and sensor-based decision-making

Gregory D. Hager

The author describes a novel approach to sensor-based decision-making that involves formulating and solving large systems of parametric constraints. The constraints describe a model for sensor data and the criteria for correct decisions about the data. An incremental constraint solving technique performs the minimal model recovery required to reach a decision. The approach was demonstrated on two different problems, graspability and categorization, using range data and a superellipsoid data model. The experiments indicated that simultaneous solution of both data constraints and decision criteria can lead to be efficient and effective decision-making. even when the observed data was imprecise and incomplete. >

ICRA Conference 1991 Conference Paper

Real-time vision-based robot localization

Sami Aitya
Gregory D. Hager

An algorithm for robot localization using visual landmarks is described. This algorithm determines both the correspondence between observed landmarks (in this case vertical edges in the environment) and a preloaded map, and the location of the robot from those correspondences. The primary advantages of this algorithm are its use of a single geometric tolerance to describe observation error, its ability to recognize ambiguous sets of correspondences, its ability to compute bounds on the error in localization, and its fast execution. The current version of the algorithm has been implemented and tested on a mobile robot system. In several hundred trials the algorithm has not failed, and computes location accurate to within a centimeter in less than half a second. >

IROS Conference 1991 Conference Paper

Towards geometric decision making in unstructured environments

Gregory D. Hager

Presents an approach to sensor-based decision making in unstructured environments that relies on describing geometric structures by parameterized volumes. This approach leads to large systems of nonlinear stochastic inequalities. The author describes how these inequalities can be solved using interval bisection, discusses the structural and statistical convergence of the technique, and presents some preliminary experimental results. >

ICRA Conference 1989 Conference Paper

Task-directed multisensor fusion

Gregory D. Hager
Max Mintz

The authors consider the problem of task-directed information gathering. They first develop a decision-theoretic model of task-directed sensing. In this framework, sensors are modeled as noise-contaminated, uncertain measurement systems. A sensor task is modelled as consisting of a function describing the type of information required by the task, a utility function describing sensitivity to error, and a cost function describing time or resource constraints on the system. From this description, the authors develop a computational method approximating a standard Bayesian decision-making model. This algorithm, which relies on a finite-element computation, is applicable to a wide variety of sensor fusion problems. The authors describe its derivation, analyze its error properties, and indicate how it can be made robust to errors in the description of sensors and discrepancies between geometric models and sensed objects. They also present the result of applying this fusion technique to several different information gathering tasks in simulated situations and in a distributed sensing system. >

UAI Conference 1987 Conference Paper

Estimation Procedures for Robust Sensor Control

Gregory D. Hager
Max Mintz

UAI Conference 1986 Conference Paper

Information and multi-sensor coordination

Gregory D. Hager
Hugh F. Durrant-Whyte