Arrow Research search

Author name cluster

Matthew Johnson-Roberson

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

35 papers
2 author rows

Possible papers

35

NeurIPS Conference 2025 Conference Paper

Building 3D Representations and Generating Motions From a Single Image via Video-Generation

  • Weiming Zhi
  • Ziyong Ma
  • Tianyi Zhang
  • Matthew Johnson-Roberson

Autonomous robots typically need to construct representations of their surroundings and adapt their motions to the geometry of their environment. Here, we tackle the problem of constructing a policy model for collision-free motion generation, consistent with the environment, from a single input RGB image. Extracting 3D structures from a single image often involves monocular depth estimation. Developments in depth estimation have given rise to large pre-trained models such as \emph{DepthAnything}. However, using outputs of these models for downstream motion generation is challenging due to frustum-shaped errors that arise. Instead, we propose a framework known as Video-Generation Environment Representation (VGER), which leverages the advances of large-scale video generation models to generate a moving camera video conditioned on the input image. Frames of this video, which form a multiview dataset, are then input into a pre-trained 3D foundation model to produce a dense point cloud. We then introduce a multi-scale noise approach to train an implicit representation of the environment structure and build a motion generation model that complies with the geometry of the representation. We extensively evaluate VGER over a diverse set of indoor and outdoor environments. We demonstrate its ability to produce smooth motions that account for the captured geometry of a scene, all from a single RGB input image.

ICLR Conference 2025 Conference Paper

Scalable Benchmarking and Robust Learning for Noise-Free Ego-Motion and 3D Reconstruction from Noisy Video

  • Xiaohao Xu
  • Tianyi Zhang 0014
  • Shibo Zhao
  • Xiang Li 0106
  • Sibo Wang
  • Yongqi Chen
  • Ye Li
  • Bhiksha Raj

We aim to redefine robust ego-motion estimation and photorealistic 3D reconstruction by addressing a critical limitation: the reliance on noise-free data in existing models. While such sanitized conditions simplify evaluation, they fail to capture the unpredictable, noisy complexities of real-world environments. Dynamic motion, sensor imperfections, and synchronization perturbations lead to sharp performance declines when these models are deployed in practice, revealing an urgent need for frameworks that embrace and excel under real-world noise. To bridge this gap, we tackle three core challenges: scalable data generation, comprehensive benchmarking, and model robustness enhancement. First, we introduce a scalable noisy data synthesis pipeline that generates diverse datasets simulating complex motion, sensor imperfections, and synchronization errors. Second, we leverage this pipeline to create Robust-Ego3D, a benchmark rigorously designed to expose noise-induced performance degradation, highlighting the limitations of current learning-based methods in ego-motion accuracy and 3D reconstruction quality. Third, we propose Correspondence-guided Gaussian Splatting (CorrGS), a novel method that progressively refines an internal clean 3D representation by aligning noisy observations with rendered RGB-D frames from clean 3D map, enhancing geometric alignment and appearance restoration through visual correspondence. Extensive experiments on synthetic and real-world data demonstrate that CorrGS consistently outperforms prior state-of-the-art methods, particularly in scenarios involving rapid motion and dynamic illumination. We will release our code and benchmark to advance robust 3D vision, setting a new standard for ego-motion estimation and high-fidelity reconstruction in noisy environments.

IROS Conference 2024 Conference Paper

DarkGS: Learning Neural Illumination and 3D Gaussians Relighting for Robotic Exploration in the Dark

  • Tianyi Zhang 0014
  • Kaining Huang
  • Weiming Zhi
  • Matthew Johnson-Roberson

Humans have the remarkable ability to construct consistent mental models of an environment, even under limited or varying levels of illumination. We wish to endow robots with this same capability. In this paper, we tackle the challenge of constructing a photorealistic scene representation under poorly illuminated conditions and with a moving light source. We approach the task of modeling illumination as a learning problem, and utilize the developed illumination model to aid in scene reconstruction. We introduce an innovative framework that uses a data-driven approach, Neural Light Simulators (NeLiS), to model and calibrate the camera-light system. Furthermore, we present DarkGS, a method that applies NeLiS to create a relightable 3D Gaussian scene model capable of real-time, photorealistic rendering from novel viewpoints. We show the applicability and robustness of our proposed simulator and system in a variety of real-world environments. Code released at https://tyz1030.github.io/proj/darkgs.html

ICRA Conference 2024 Conference Paper

Instructing Robots by Sketching: Learning from Demonstration via Probabilistic Diagrammatic Teaching

  • Weiming Zhi
  • Tianyi Zhang 0014
  • Matthew Johnson-Roberson

Learning from Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions intuitively. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, then applies ray-tracing to find corresponding regions in 3D Cartesian space, and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.

IROS Conference 2024 Conference Paper

Teaching Robots Where To Go And How To Act With Human Sketches via Spatial Diagrammatic Instructions

  • Qilin Sun 0007
  • Weiming Zhi
  • Tianyi Zhang 0014
  • Matthew Johnson-Roberson

This paper introduces Spatial Diagrammatic Instructions (SDIs), an approach for human operators to specify objectives and constraints that are related to spatial regions in the working environment. Human operators are enabled to sketch out regions directly on camera images that correspond to the objectives and constraints. These sketches are projected to 3D spatial coordinates, and continuous Spatial Instruction Maps (SIMs) are learned upon them. These maps can then be integrated into optimization problems for tasks of robots. In particular, we demonstrate how Spatial Diagrammatic Instructions can be applied to solve the Base Placement Problem of mobile manipulators, which concerns the best place to put the manipulator to facilitate a certain task. Human operators can specify, via sketch, spatial regions of interest for a manipulation task and permissible regions for the mobile manipulator to be at. Then, an optimization problem that maximizes the manipulator’s reachability, or coverage, over the designated regions of interest while remaining in the permissible regions is solved. We provide extensive empirical evaluations, and show that our formulation of Spatial Instruction Maps provides accurate representations of user-specified diagrammatic instructions. Furthermore, we demonstrate that our diagrammatic approach to the Mobile Base Placement Problem enables higher quality solutions and faster runtime.

IROS Conference 2024 Conference Paper

V-PRISM: Probabilistic Mapping of Unknown Tabletop Scenes

  • Herbert Wright
  • Weiming Zhi
  • Matthew Johnson-Roberson
  • Tucker Hermans

The ability to construct concise scene representations from sensor input is central to the field of robotics. This paper addresses the problem of robustly creating a 3D representation of a tabletop scene from a segmented RGBD image. These representations are then critical for a range of downstream manipulation tasks. Many previous attempts to tackle this problem do not capture accurate uncertainty, which is required to subsequently produce safe motion plans. In this paper, we cast the representation of 3D tabletop scenes as a multi-class classification problem. To tackle this, we introduce V-PRISM, a framework and method for robustly creating probabilistic 3D segmentation maps of tabletop scenes. Our maps contain both occupancy estimates, segmentation information, and principled uncertainty measures. We evaluate the robustness of our method in (1) procedurally generated scenes using open-source object datasets, and (2) real-world tabletop data collected from a depth camera. Our experiments show that our approach outperforms alternative continuous reconstruction approaches that do not explicitly reason about objects in a multi-class formulation.

IROS Conference 2022 Conference Paper

LiSnowNet: Real-time Snow Removal for LiDAR Point Clouds

  • Ming-Yuan Yu
  • Ram Vasudevan
  • Matthew Johnson-Roberson

Light Detection And Rangings (LiDARs) have been widely adopted to modern self-driving vehicles, providing 3D information of the scene and surrounding objects. However, adverser weather conditions still pose significant challenges to LiDARs since point clouds captured during snowfall can easily be corrupted. The resulting noisy point clouds degrade downstream tasks such as mapping. Existing works in de-noising point clouds corrupted by snow are based on nearest-neighbor search, and thus do not scale well with modern LiDARs which usually capture 100k or more points at 10Hz. In this paper, we introduce an unsupervised de-noising algorithm, LiSnowNet, running 52 x faster than the state-of-the-art methods while achieving superior performance in de-noising. Unlike previous methods, the proposed algorithm is based on a deep convolutional neural network and can be easily deployed to hardware accelerators such as GPUs. In addition, we demonstrate how to use the proposed method for mapping even with corrupted point clouds.

IJCAI Conference 2021 Conference Paper

Coupling Intent and Action for Pedestrian Crossing Behavior Prediction

  • Yu Yao
  • Ella Atkins
  • Matthew Johnson-Roberson
  • Ram Vasudevan
  • Xiaoxiao Du

Accurate prediction of pedestrian crossing behaviors by autonomous vehicles can significantly improve traffic safety. Existing approaches often model pedestrian behaviors using trajectories or poses but do not offer a deeper semantic interpretation of a person's actions or how actions influence a pedestrian's intention to cross in the future. In this work, we follow the neuroscience and psychological literature to define pedestrian crossing behavior as a combination of an unobserved inner will (a probabilistic representation of binary intent of crossing vs. not crossing) and a set of multi-class actions (e. g. , walking, standing, etc. ). Intent generates actions, and the future actions in turn reflect the intent. We present a novel multi-task network that predicts future pedestrian actions and uses predicted future action as a prior to detect the present intent and action of the pedestrian. We also designed an attention relation network to incorporate external environmental contexts thus further improve intent and action detection performance. We evaluated our approach on two naturalistic driving datasets, PIE and JAAD, and extensive experiments show significantly improved and more explainable results for both intent detection and action prediction over state-of-the-art approaches. Our code is available at: https: //github. com/umautobots/pedestrian_intent_action_detection

ICRA Conference 2021 Conference Paper

Energy-optimal Path Planning with Active Flow Perception for Autonomous Underwater Vehicles

  • Niankai Yang
  • Dongsik Chang
  • Matthew Johnson-Roberson
  • Jing Sun 0003

Accurate flow predictions are critical for energy-optimal path planning of AUVs with endurance requirements. However, the complex dynamics of ocean currents make it difficult to achieve accurate flow predictions. For an AUV with flow and location sensing capabilities, one can optimize vehicle actions so that the flow information collected along the vehicle path reduces flow prediction uncertainty, referred to as active flow perception. In this paper, we propose an energy-optimal path planning approach that incorporates active flow perception. The proposed approach achieves the objectives of vehicle energy consumption minimization and flow prediction uncertainty reduction. To quantify flow prediction uncertainty, an empirical flow model parameterized using the proper orthogonal decomposition (POD) is constructed based on historical data. Assuming negligible unmodeled dynamics in the POD model, the flow prediction uncertainty is evaluated by the Cramer-Rao (CR) bound of estimated model parameters. To establish active flow perception combined with energy optimal path planning, we formulate the cost to be minimized during path planning in terms of vehicle energy using estimated flow parameters and CR bound. Through simulations, the proposed approach is compared with approaches that plan energy-optimal paths using i) true flow and ii) flow predictions without active flow perception. Simulation results demonstrate the satisfactory energy-saving performance of the proposed approach.

ICRA Conference 2020 Conference Paper

Leveraging the Template and Anchor Framework for Safe, Online Robotic Gait Design

  • Jinsun Liu
  • Pengcheng Zhao
  • Zhenyu Gan
  • Matthew Johnson-Roberson
  • Ram Vasudevan

Online control design using a high-fidelity, full-order model for a bipedal robot can be challenging due to the size of the state space of the model. A commonly adopted solution to overcome this challenge is to approximate the fullorder model (anchor) with a simplified, reduced-order model (template), while performing control synthesis. Unfortunately it is challenging to make formal guarantees about the safety of an anchor model using a controller designed in an online fashion using a template model. To address this problem, this paper proposes a method to generate safety-preserving controllers for anchor models by performing reachability analysis on template models by relying on functions that bound the difference between the two models. This paper describes how this reachable set can be incorporated into a Model Predictive Control framework to select controllers that result in safe walking on the anchor model in an online fashion. The method is illustrated on a 5-link RABBIT model, and is shown to allow the robot to walk safely while utilizing controllers designed in an online fashion.

ICRA Conference 2020 Conference Paper

LiStereo: Generate Dense Depth Maps from LIDAR and Stereo Imagery

  • Junming Zhang
  • Manikandasriram Srinivasan Ramanagopal
  • Ram Vasudevan
  • Matthew Johnson-Roberson

An accurate depth map of the environment is critical to the safe operation of autonomous robots and vehicles. Currently, either light detection and ranging (LIDAR) or stereo matching algorithms are used to acquire such depth information. However, a high-resolution LIDAR is expensive and produces sparse depth map at large range; stereo matching algorithms are able to generate denser depth maps but are typically less accurate than LIDAR at long range. This paper combines these approaches together to generate high-quality dense depth maps. Unlike previous approaches that are trained using ground-truth labels, the proposed model adopts a self-supervised training process. Experiments show that the proposed method is able to generate high-quality dense depth maps and performs robustly even with low-resolution inputs. This shows the potential to reduce the cost by using LIDARs with lower resolution in concert with stereo systems while maintaining high resolution.

ICRA Conference 2020 Conference Paper

Risk Assessment and Planning with Bidirectional Reachability for Autonomous Driving

  • Ming-Yuan Yu
  • Ram Vasudevan
  • Matthew Johnson-Roberson

Risk assessment to quantify the danger associated with taking a certain action is critical to navigating safely through crowded urban environments during autonomous driving. Risk assessment and subsequent planning is usually done by first tracking and predicting trajectories of other agents, such as vehicles and pedestrians, and then choosing an action to avoid future collisions. However, few existing risk assessment algorithms handle occlusion and other sensory limitations effectively. One either assesses the risk in the worst-case scenario and thus makes the ego vehicle overly conservative, or predicts as many hidden agents as possible and thus makes the computation intensive. This paper explores the possibility of efficient risk assessment under occlusion via both forward and backward reachability. The proposed algorithm can not only identify the location of risk-inducing factors, but can also be used during motion planning. The proposed method is evaluated on various four-way highly occluded intersections with up to five other vehicles in the scene. Compared with other risk assessment algorithms, the proposed method shows better efficiency, meaning that the ego vehicle reaches the goal at a higher speed. In addition, it also lowers the median collision rate by 7. 5× when compared to state of the art techniques.

ICRA Conference 2020 Conference Paper

Towards distortion based underwater domed viewport camera calibration

  • Eduardo Iscar
  • Matthew Johnson-Roberson

Photogrammetry techniques used for 3D reconstructions and motion estimation from images are based on projective geometry that models the image formation process. However, in the underwater setting, refraction of light rays at the housing interface introduce non-linear effects in the image formation. These effects produce systematic errors if not accounted for, and severely degrade the quality of the acquired images. In this paper, we present a novel approach to the calibration of cameras inside spherical domes with large offsets between dome and camera centers. Such large offsets not only amplify the effect of refraction, but also introduce blur in the image that corrupts feature extractors used to establish image-world correspondences in existing refractive calibration methods. We propose using the point spread function (PSF) as a complete description of the optical system and introduce a procedure to recover the camera pose inside the dome based on the measurement of the distortions. Results on a collected dataset show the method is capable of recovering the camera pose with high accuracy.

ICRA Conference 2019 Conference Paper

A constrained control-planning strategy for redundant manipulators

  • Corina Barbalata
  • Ram Vasudevan
  • Matthew Johnson-Roberson

This paper presents an interconnected control-planning strategy for redundant manipulators, subject to system and environmental constraints. The method incorporates low-level control characteristics and high-level planning components into a robust strategy for manipulators acting in complex environments, subject to joint limits. This strategy is formulated using an adaptive control rule, a computational efficient estimation of the robot's mathematical model and the nullspace of the constraints. A path is generated that takes into account the capabilities of the platform. The proposed method is computationally efficient, enabling its implementation on a real multi-body robotic system. Through experimental results with a 7 degree-of-freedom (DOF) manipulator, we demonstrate the performance of the method in real-world scenarios.

ICRA Conference 2019 Conference Paper

Localization and Tracking of Uncontrollable Underwater Agents: Particle Filter Based Fusion of On-Body IMUs and Stationary Cameras

  • Ding Zhang
  • Joaquin Gabaldon
  • Lisa Lauderdale
  • Matthew Johnson-Roberson
  • Lance J. Miller
  • Kira Barton
  • K. Alex Shorter

Tracking of uncontrollable agents in a controlled environment is an important research question for the coordination of controllable and uncontrollable agents and bio-inspired multi-agent control. This paper presents a framework that approaches the multiagent tracking problem from a localization perspective, utilizing a combination of wearable sensors and stationary cameras. Specifically, this framework was applied to localize uncontrollable biological agents (dolphins) in a well defined environment. The biological agents were outfitted with wearable sensors (IMU, speed, depth) and were free to move in their three dimensional habitat. The dynamic data collected by the wearable sensors was supplemented with image data collected using a pair of cameras mounted above the habitat. The framework presented in this paper combines data from these sensor streams to calculate an accurate estimate of the animal's location during extended periods of free movement. The associations between camera detections and tagged agents are handled using a particle filter embedded with a fuzzy observation concept. The platform is readily implementable in similar water / land environments, and is able to handle nonlinear agent dynamics, non-Gaussian noise, and sparse camera observations while maintaining robust agent localization and tracking.

IROS Conference 2019 Conference Paper

Stochastic Sampling Simulation for Pedestrian Trajectory Prediction

  • Cyrus Anderson
  • Xiaoxiao Du 0001
  • Ram Vasudevan
  • Matthew Johnson-Roberson

Urban environments pose a significant challenge for autonomous vehicles (AVs) as they must safely navigate while in close proximity to many pedestrians. It is crucial for the AV to correctly understand and predict the future trajectories of pedestrians to avoid collision and plan a safe path. Deep neural networks (DNNs) have shown promising results in accurately predicting pedestrian trajectories, relying on large amounts of annotated real-world data to learn pedestrian behavior. However, collecting and annotating these large real-world pedestrian datasets is costly in both time and labor. This paper describes a novel method using a stochastic sampling-based simulation to train DNNs for pedestrian trajectory prediction with social interaction. Our novel simulation method can generate vast amounts of automatically-annotated, realistic, and naturalistic synthetic pedestrian trajectories based on small amounts of real annotation. We then use such synthetic trajectories to train an off-the-shelf state-of-the-art deep learning approach Social GAN (Generative Adversarial Network) to perform pedestrian trajectory prediction. Our proposed architecture, trained only using synthetic trajectories, achieves better prediction results compared to those trained on human-annotated real-world data using the same network. Our work demonstrates the effectiveness and potential of using simulation as a substitution for human annotation efforts to train high-performing prediction algorithms such as the DNNs.

ICRA Conference 2019 Conference Paper

UWStereoNet: Unsupervised Learning for Depth Estimation and Color Correction of Underwater Stereo Imagery

  • Katherine A. Skinner
  • Junming Zhang
  • Elizabeth A. Olson
  • Matthew Johnson-Roberson

Stereo cameras are widely used for sensing and navigation of underwater robotic systems. They can provide high resolution color views of a scene; the constrained camera geometry enables metrically accurate depth estimation; they are also relatively cost-effective. Traditional stereo vision algorithms rely on feature detection and matching to enable triangulation of points for estimating disparity. However, for underwater applications, the effects of underwater light propagation lead to image degradation, reducing image quality and contrast. This makes it especially challenging to detect and match features, especially from varying viewpoints. Recently, deep learning has shown success in end-to-end learning of dense disparity maps from stereo images. Still, many state-of-the-art methods are supervised and require ground truth depth or disparity, which is challenging to gather in subsea environments. Simultaneously, deep learning has also been applied to the problem of underwater image restoration. Again, it is difficult or impossible to gather real ground truth data for this problem. In this work, we present an unsupervised deep neural network (DNN) that takes input raw color underwater stereo imagery and outputs dense depth maps and color corrected imagery of underwater scenes. We leverage a model of the process of underwater image formation, image processing techniques, as well as the geometric constraints inherent to the stereo vision problem to develop a modular network that outperforms existing methods.

ICRA Conference 2018 Conference Paper

Robust Environmental Mapping by Mobile Sensor Networks

  • Hyongju Park
  • Jinsun Liu
  • Matthew Johnson-Roberson
  • Ram Vasudevan

Constructing a spatial map of environmental parameters is a crucial step to preventing hazardous chemical leakages, forest fires, or while estimating a spatially distributed physical quantities such as terrain elevation. Although prior methods can do such mapping tasks efficiently via dispatching a group of autonomous agents, they are unable to ensure satisfactory convergence to the underlying ground truth distribution in a decentralized manner when any of the agents fail. Since the types of agents utilized to perform such mapping are typically inexpensive and prone to failure, this results in poor overall mapping performance in real-world applications, which can in certain cases endanger human safety. This paper presents a Bayesian approach for robust spatial mapping of environmental parameters by deploying a group of mobile robots capable of ad-hoc communication equipped with short-range sensors in the presence of hardware failures. Our approach first utilizes a variant of the Voronoi diagram to partition the region to be mapped into disjoint regions that are each associated with at least one robot. These robots are then deployed in a decentralized manner to maximize the likelihood that at least one robot detects every target in their associated region despite a non-zero probability of failure. A suite of simulation results is presented to demonstrate the effectiveness and robustness of the proposed method when compared to existing techniques.

IROS Conference 2017 Conference Paper

A framework for enhanced localization of marine mammals using auto-detected video and wearable sensor data fusion

  • Joaquin Gabaldon
  • Ding Zhang
  • Kira Barton
  • Matthew Johnson-Roberson
  • K. Alex Shorter

Accurate biological agent localization offers the opportunity for both researchers and institutions to gain new knowledge about individual and group behaviors of biosystems. This paper presents a sensor-fusion approach for tracking biological agents, combining the data from automated video logging with magnetic, angular rate, and gravity (MARG) and inertial measurement unit (IMU) data, with professionally managed dolphins as the representative example. Our method of video logging allows for accurate and automated dolphin location detection using a combination of Laplacian of Gaussian (LoG) and multi-orientation elliptical blob detection. These data are combined with MARG/IMU measurements to generate a localization estimate through a series of drift-correcting Kalman and gradient-descent filters, finalized with Incremental Smoothing and Mapping (iSAM2) pose-graph localization.

IROS Conference 2017 Conference Paper

A probabilistic framework for intrinsic image decomposition from RGB-D streams

  • Wonhui Kim
  • Matthew Johnson-Roberson

Lighting and shading can have a great impact on a robot's ability to recognize, match, and classify objects in indoor scenes. Additionally, the realism of augmented reality applications benefit greatly from understanding the lighting and shading in a scene. To aid in extracting this information from a mobile platform we introduce a novel framework to solve the intrinsic image decomposition problem for RGB-D streams. In our pipeline, the task is formulated as a Bayesian estimation problem. Compared to frame-based methods that must solve a full conditional random field (CRF) optimization problem at each time step, our framework can utilize the knowledge of past frames to predict the intrinsic images at a given frame. Our approach produces more reliable and consistent predictions over time, and our filtering-based framework achieves significant performance gains. Furthermore, our framework can be easily integrated into standard perception loops in many robotic systems that use a similar recursive filtering structure. We show qualitative results on real data and generate quantitative results using ground truth from a photorealistic synthetic dataset produced using a state-of-the-art ray tracer and high fidelity 3D model.

ICRA Conference 2017 Conference Paper

Automatic color correction for 3D reconstruction of underwater scenes

  • Katherine A. Skinner
  • Eduardo Iscar
  • Matthew Johnson-Roberson

Mapping of underwater environments is a critical task for a range of activities from monitoring coral reef habitats to surveying submerged archaeological sites. While recent advances in methods for terrestrial mapping can achieve dense 3D reconstructions of scenes in real-time, there remains the challenge of transferring these methods to the underwater domain due to characteristic effects on propagation of light through the water column that violate the brightness constancy constraint used in terrestrial techniques. Current state-of-the-art methods for underwater 3D reconstruction exploit a physical model of light propagation underwater to account for such range-dependent effects as scattering and attenuation; however, these methods necessitate careful calibration of attenuation coefficients required by the physical model, or rely on rough estimates of these coefficients from prior lab experiments. The main contribution of this paper is to develop a novel method to achieve simultaneous estimation of attenuation coefficients for color correction during structure recovery of an underwater scene by integrating this estimation directly into the bundle adjustment step, which performs non-linear optimization. To validate the proposed method, an artificial scene is submerged in a pure water tank and surveyed with a stereo camera platform to simulate an underwater robotic survey in a controlled environment. The target structure is imaged in air with an RGB-D sensor to provide ground truth structure and color, and a color calibration board is place in the scene for further reference. Results show that the proposed method can automatically estimate a water-column aware model for color correction of underwater images simultaneously to 3D reconstruction of the submerged scene.

ICRA Conference 2017 Conference Paper

Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks?

  • Matthew Johnson-Roberson
  • Charles Barto
  • Rounak Mehta
  • Sharath Nittur Sridhar
  • Karl Rosaen
  • Ram Vasudevan

Deep learning has rapidly transformed the state of the art algorithms used to address a variety of problems in computer vision and robotics. These breakthroughs have relied upon massive amounts of human annotated training data. This time consuming process has begun impeding the progress of these deep learning efforts. This paper describes a method to incorporate photo-realistic computer images from a simulation engine to rapidly generate annotated data that can be used for the training of machine learning algorithms. We demonstrate that a state of the art architecture, which is trained only using these synthetic annotations, performs better than the identical architecture trained on human annotated real-world data, when tested on the KITTI data set for vehicle detection. By training machine learning algorithms on a rich virtual world, real objects in real scenes can be learned and classified using synthetic data. This approach offers the possibility of accelerating deep learning's application to sensor-based classification problems like those that appear in self-driving cars. The source code and data to train and validate the networks described in this paper are made available for researchers.

IROS Conference 2016 Conference Paper

Towards real-time underwater 3D reconstruction with plenoptic cameras

  • Katherine A. Skinner
  • Matthew Johnson-Roberson

Achieving real-time perception is critical to developing a fully autonomous system that can sense, navigate, and interact with its environment. Perception tasks such as online 3D reconstruction and mapping have been intensely studied for terrestrial robotics applications. However, characteristics of the underwater domain such as light attenuation and light scattering violate the brightness constancy constraint, which is an underlying assumption in methods developed for land-based applications. Furthermore, the complex nature of light propagation underwater limits or even prevents subsea use of real-time depth sensors used in state-of-the-art terrestrial mapping techniques. There have been recent advances in the development of plenoptic (also known as light field) cameras, which use an array of micro lenses capturing both intensity and ray direction to enable color and depth measurement from a single passive sensor. This paper presents an end-to-end system to harness these cameras to produce real-time 3D reconstructions underwater. Our system builds upon the state-of-the-art in online terrestrial 3D reconstruction, transferring these approaches to the underwater domain by gathering real-time color and depth (RGB-D) data underwater using a plenoptic camera, and performing dense 3D reconstruction while compensating for attenuation effects of the underwater environment simultaneously, using a graphics processing unit (GPU) to achieve real-time performance. Results are presented for data gathered in a water tank and the proposed technique is validated quantitatively through comparison with a ground truth 3D model gathered in air to demonstrate that the proposed approach can generate accurate 3D models of objects underwater in real-time.

IROS Conference 2016 Conference Paper

Utilizing high-dimensional features for real-time robotic applications: Reducing the curse of dimensionality for recursive Bayesian estimation

  • Jie Li 0017
  • Paul Ozog
  • Jacob D. Abernethy
  • Ryan M. Eustice
  • Matthew Johnson-Roberson

Feature learning has become popular in robotics due to recent advances in machine learning. In this paper, we propose a novel method to utilize the high-dimensional features from these techniques as observations in Bayesian estimation problems in a real-time manner. We develop an approach that: 1) pre-processes the observations and maps them into a new space with both reduced dimensions and a linear relationship to the estimation states; and 2) estimates the uncertainty of resulting outputs using data perturbation. The result is that deep learning approaches can be combined with more traditional filtering approaches like the Kalman filter (KF) to achieve state-of-the-art real-time performance. We validate the method by presenting the first real-time application of underwater robot localization using an imaging sonar. The proposed technique shows similar localization accuracy to benchmark approaches while simultaneously achieving real-time performance.

ICRA Conference 2015 Conference Paper

Building 3D mosaics from an Autonomous Underwater Vehicle, Doppler velocity log, and 2D imaging sonar

  • Paul Ozog
  • Giancarlo Troni
  • Michael Kaess
  • Ryan M. Eustice
  • Matthew Johnson-Roberson

This paper reports on a 3D photomosaicing pipeline using data collected from an autonomous underwater vehicle performing simultaneous localization and mapping (SLAM). The pipeline projects and blends 2D imaging sonar data onto a large-scale 3D mesh that is either given a priori or derived from SLAM. Compared to other methods that generate a 2D-only mosaic, our approach produces 3D models that are more structurally representative of the environment being surveyed. Additionally, our system leverages recent work in underwater SLAM using sparse point clouds derived from Doppler velocity log range returns to relax the need for a prior model. We show that the method produces reasonably accurate surface reconstruction and blending consistency, with and without the use of a prior mesh. We experimentally evaluate our approach with a Hovering Autonomous Underwater Vehicle (HAUV) performing inspection of a large underwater ship hull.

IROS Conference 2015 Conference Paper

Financialized methods for market-based multi-sensor fusion

  • Jacob D. Abernethy
  • Matthew Johnson-Roberson

Autonomous systems rely on an increasing number of input sensors of various modalities, and the problem of sensor fusion has received attention for many years. Autonomous system architectures are becoming more complex with time, and the number and placement of sensors will be modified regularly, sensors will fail for many reasons, information will arrive asynchronously, and the system will need to adjust to rapidly changing environments. To address these issues we propose a new paradigm for fusing information from multiple sources that draws from the rich of field pertaining to financial markets, particularly recent research on prediction market design. Among the many benefits of this financialized approach is that, both in theory and in practice, markets are well-equipped to robustly synthesize information from diverse sources in a decentralized fashion. Our framework poses sensor processing algorithms as profit-seeking market participants, data is incorporated via financial transactions, and the joint estimation is represented as a price equilibrium. We use pedestrian detection as a motivating application. Pedestrian detection is a well studied field and essential to autonomous driving. Real world fusion results are presented on RGB and LIDAR data from the KITTI Vision Benchmark Suite. We demonstrate we can achieve comparable performance to state-of-the-art hand designed fusion techniques using the proposed approach.

ICRA Conference 2015 Conference Paper

High-level visual features for underwater place recognition

  • Jie Li 0017
  • Ryan M. Eustice
  • Matthew Johnson-Roberson

This paper reports on a method to perform robust visual relocalization between temporally separated sets of underwater images gathered by a robot. The place recognition and relocalization problem is more challenging in the underwater environment mainly due to three factors: 1) changes in illumination; 2) long-term changes in the visual appearance of features because of phenomena like biofouling on man-made structures and growth or movement in natural features; and 3) low density of visually salient features for image matching. To address these challenges, a patch-based feature matching approach is proposed, which uses image segmentation and local intensity contrast to locate salient patches and HOG description to make correspondences between patches. Compared to traditional point-based features that are sensitive to dramatic appearance changes underwater, patch-based features are able to encode higher level information such as shape or structure which tends to persist across years in underwater environments. The algorithm is evaluated on real data, from multiple years, collected by a Hovering Autonomous Underwater Vehicle for ship hull inspection. Results in relocalization performance across missions from different years are compared to other traditional methods.

ICRA Conference 2014 Conference Paper

Crowdsourced saliency for mining robotically gathered 3D maps using multitouch interaction on smartphones and tablets

  • Matthew Johnson-Roberson
  • Mitch Bryson
  • Bertrand Douillard
  • Oscar Pizarro
  • Stefan B. Williams

This paper presents a system for crowdsourcing saliency interest points for robotically gathered 3D maps rendered on smartphones and tablets. An app was created that is capable of interactively rendering 3D reconstructions gathered with an Autonomous Underwater Vehicle. Through hundreds of thousands of logged user interactions with the models we attempt to data-mine salient interest points. To this end we propose two models for calculating saliency from human interaction with the data. The first uses the view frustum of the camera to track the amount of time points are on screen. The second treats the camera's path as a time series and uses a Hidden Markov model to learn the classification of salient and non-salient points. To provide a comparison to existing techniques, several traditional visual saliency approaches are applied to orthographic views of the models' photo-texturing. The results of all approaches are validated with human attention ground truth gathered using a remote gaze-tracking system that recorded the locations of the person's attention while exploring the models.

IROS Conference 2013 Conference Paper

Automated registration for multi-year robotic surveys of marine benthic habitats

  • Mitch Bryson
  • Matthew Johnson-Roberson
  • Oscar Pizarro
  • Stefan B. Williams

This paper presents recent developments in data processing of multi-year repeat survey imagery and precision automatic registration for monitoring long-term changes in benthic marine habitats such as coral reefs and kelp forests. Three different methods are presented and compared for precision alignment of imagery maps collected over a range of time-scales from 12 hours to two years between dives. The first method uses Scale Invariant Feature Transform (SIFT) features computed over imagery mosaics to compute the relative translational offset between repeat dives. The second method employs scan-optimisation using the bathymetry generated via structure-from-motion thus capturing more stable features in the environment, lending itself to larger timescale registration. The third method uses mutual information optimisation to register imagery maps, providing robustness to changes in the colour and brightness of objects in an underwater scene across multiple years. Results are presented from field data collected using an Autonomous Underwater Vehicle (AUV) in sites across the Australian coast between 2009 and 2011.

IROS Conference 2011 Conference Paper

Enhanced visual scene understanding through human-robot dialog

  • Matthew Johnson-Roberson
  • Jeannette Bohg
  • Gabriel Skantze
  • Joakim Gustafson
  • Rolf Carlson
  • Babak Rasolzadeh
  • Danica Kragic

We propose a novel human-robot-interaction framework for robust visual scene understanding. Without any a-priori knowledge about the objects, the task of the robot is to correctly enumerate how many of them are in the scene and segment them from the background. Our approach builds on top of state-of-the-art computer vision methods, generating object hypotheses through segmentation. This process is combined with a natural dialog system, thus including a 'human in the loop' where, by exploiting the natural conversation of an advanced dialog system, the robot gains knowledge about ambiguous situations. We present an entropy-based system allowing the robot to detect the poorest object hypotheses and query the user for arbitration. Based on the information obtained from the human-robot dialog, the scene segmentation can be re-seeded and thereby improved. We present experimental results on real data that show an improved segmentation performance compared to segmentation without interaction.

ICRA Conference 2011 Conference Paper

Mind the gap - robotic grasping under incomplete observation

  • Jeannette Bohg
  • Matthew Johnson-Roberson
  • Beatriz León
  • Javier Felip
  • Xavi Gratal
  • Niklas Bergström
  • Danica Kragic
  • Antonio Morales

We consider the problem of grasp and manipulation planning when the state of the world is only partially observable. Specifically, we address the task of picking up unknown objects from a table top. The proposed approach to object shape prediction aims at closing the knowledge gaps in the robot's understanding of the world. A completed state estimate of the environment can then be provided to a simulator in which stable grasps and collision-free movements are planned.

ICRA Conference 2011 Conference Paper

Reconstructing pavlopetri: Mapping the world's oldest submerged town using stereo-vision

  • Ian Mahon
  • Oscar Pizarro
  • Matthew Johnson-Roberson
  • Ariell Friedman
  • Stefan B. Williams
  • Jon C. Henderson

This paper presents a vision-based underwater mapping system, which is demonstrated in an archaeological survey of the submerged ancient town of Pavlopetri. The snorkeler or diver operated system provides a low cost alternative to the use of an AUV or ROV in shallow waters. The system produces textured three-dimensional models, which contain significantly more information than traditional archaeological survey methods. The photo-realistic maps that are produced allow further archaeological research to be performed, without diving on a site during the restrictive time limitations of permits and field seasons. The hardware and software components of the mapping system and its method of operation are described, and initial results are presented and discussed.

IROS Conference 2010 Conference Paper

Attention-based active 3D point cloud segmentation

  • Matthew Johnson-Roberson
  • Jeannette Bohg
  • Mårten Björkman
  • Danica Kragic

In this paper we present a framework for the segmentation of multiple objects from a 3D point cloud. We extend traditional image segmentation techniques into a full 3D representation. The proposed technique relies on a state-of-the-art min-cut framework to perform a fully 3D global multi-class labeling in a principled manner. Thereby, we extend our previous work in which a single object was actively segmented from the background. We also examine several seeding methods to bootstrap the graphical model-based energy minimization and these methods are compared over challenging scenes. All results are generated on real-world data gathered with an active vision robotic head. We present quantitive results over aggregate sets as well as visual results on specific examples.

IROS Conference 2010 Conference Paper

Strategies for multi-modal scene exploration

  • Jeannette Bohg
  • Matthew Johnson-Roberson
  • Mårten Björkman
  • Danica Kragic

We propose a method for multi-modal scene exploration where initial object hypothesis formed by active visual segmentation are confirmed and augmented through haptic exploration with a robotic arm. We update the current belief about the state of the map with the detection results and predict yet unknown parts of the map with a Gaussian Process. We show that through the integration of different sensor modalities, we achieve a more complete scene model. We also show that the prediction of the scene structure leads to a valid scene representation even if the map is not fully traversed. Furthermore, we propose different exploration strategies and evaluate them both in simulation and on our robotic platform.

ICRA Conference 2009 Conference Paper

Airborne smoothing and mapping using vision and inertial sensors

  • Mitch Bryson
  • Matthew Johnson-Roberson
  • Salah Sukkarieh

This paper presents a framework for integrating sensor information from an inertial measuring unit (IMU), Global Positioning System (GPS) receiver and monocular vision camera mounted to a low-flying unmanned aerial vehicle (UAV) for building large-scale terrain reconstructions. Our method seeks to integrate all of the sensor information using a statistically optimal non-linear least squares smoothing algorithm to estimate vehicle poses simultaneously to a dense point feature map of the terrain. A visualisation of the terrain structure is then created by building a textured mesh-surface from the estimated point features. The resulting terrain reconstruction can be used for a range of environmental monitoring missions such as invasive plant detection and biomass mapping.