Arrow Research search

Author name cluster

Roozbeh Mottaghi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

22 papers
2 author rows

Possible papers

22

ICLR Conference 2025 Conference Paper

PARTNR: A Benchmark for Planning and Reasoning in Embodied Multi-agent Tasks

  • Matthew Chang
  • Gunjan Chhablani
  • Alexander Clegg
  • Mikael Dallaire Cote
  • Ruta Desai
  • Michal Hlavac
  • Vladimir Karashchuk
  • Jacob Krantz

We present a benchmark for Planning And Reasoning Tasks in humaN-Robot collaboration (PARTNR) designed to study human-robot coordination in household activities. PARTNR tasks exhibit characteristics of everyday tasks, such as spatial, temporal, and heterogeneous agent capability constraints. We employ a semi-automated task generation pipeline using Large Language Models (LLMs), incorporating simulation-in-the-loop for the grounding and verification. PARTNR stands as the largest benchmark of its kind, comprising 100,000 natural language tasks, spanning 60 houses and 5,819 unique objects. We analyze state-of-the-art LLMs on PARTNR tasks, across the axes of planning, perception and skill execution. The analysis reveals significant limitations in SoTA models, such as poor coordination and failures in task tracking and recovery from errors. When LLMs are paired with 'real' humans, they require 1.5x as many steps as two humans collaborating and 1.1x more steps than a single human, underscoring the potential for improvement in these models. We further show that fine-tuning smaller LLMs with planning data can achieve performance on par with models 9 times larger, while being 8.6x faster at inference. Overall, PARTNR highlights significant challenges facing collaborative embodied agents and aims to drive research in this direction.

IROS Conference 2024 Conference Paper

Embodiment Randomization for Cross Embodiment Navigation

  • Pranav Putta
  • Gunjan Aggarwal
  • Roozbeh Mottaghi
  • Dhruv Batra
  • Naoki Yokoyama
  • Joanne Truong
  • Arjun Majumdar

We present Embodiment Randomization, a simple, inexpensive, and intuitive technique for training robust behavior policies that can be transferred to multiple robot embodiments. While prior works require real-world data from multiple robots, or complex algorithmic adjustments to address the challenge of embodiment generalization, our approach leverages the power of simulation and large-scale reinforcement learning and can be easily integrated within existing policy learning methods. We show that policies trained with embodiment randomization implicitly perform system identification, enabling them to adapt to new embodiments during deployment. Our approach not only shows significant improvements in adapting to novel robot configurations, but also in generalizing from simulation to reality and contending with real-world perturbations, highlighting the potential of embodiment randomization in creating versatile and adaptable robotic navigation policies.

NeurIPS Conference 2024 Conference Paper

From an Image to a Scene: Learning to Imagine the World from a Million 360° Videos

  • Matthew Wallingford
  • Anand Bhattad
  • Aditya Kusupati
  • Vivek Ramanujan
  • Matt Deitke
  • Sham Kakade
  • Aniruddha Kembhavi
  • Roozbeh Mottaghi

Three-dimensional (3D) understanding of objects and scenes play a key role in humans' ability to interact with the world and has been an active area of research in computer vision, graphics, and robotics. Large scale synthetic and object-centric 3D datasets have shown to be effective in training models that have 3D understanding of objects. However, applying a similar approach to real-world objects and scenes is difficult due to a lack of large-scale data. Videos are a potential source for real-world 3D data, but finding diverse yet corresponding views of the same content have shown to be difficult at scale. Furthermore, standard videos come with fixed viewpoints, determined at the time of capture. This restricts the ability to access scenes from a variety of more diverse and potentially useful perspectives. We argue that large scale ODIN videos can address these limitations to provide scalable corresponding frames from diverse views. In this paper we introduce 360-1M, a 360° video dataset consisting of 1 million videos, and a process for efficiently finding corresponding frames from diverse viewpoints at scale. We train our diffusion-based model, ODIN, on 360-1M. Empowered by the largest real-world, multi-view dataset to date, ODIN is able to freely generate novel views of real-world scenes. Unlike previous methods, ODIN can move the camera through the environment, enabling the model to infer the geometry and layout of the scene. Additionally, we show improved performance on standard novel view synthesis and 3D reconstruction benchmarks.

ICLR Conference 2024 Conference Paper

Habitat 3. 0: A Co-Habitat for Humans, Avatars, and Robots

  • Xavier Puig
  • Eric Undersander
  • Andrew Szot
  • Mikael Dallaire Cote
  • Tsung-Yen Yang
  • Ruslan Partsey
  • Ruta Desai
  • Alexander Clegg

We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling complex deformable bodies and diversity in appearance and motion, all while ensuring high simulation speed. (2) Human-in-the-loop infrastructure: enabling real human interaction with simulated robots via mouse/keyboard or a VR interface, facilitating evaluation of robot policies with human input. (3) Collaborative tasks: studying two collaborative tasks, Social Navigation and Social Rearrangement. Social Navigation investigates a robot's ability to locate and follow humanoid avatars in unseen environments, whereas Social Rearrangement addresses collaboration between a humanoid and robot while rearranging a scene. These contributions allow us to study end-to-end learned and heuristic baselines for human-robot collaboration in-depth, as well as evaluate them with humans in the loop. Our experiments demonstrate that learned robot policies lead to efficient task completion when collaborating with unseen humanoid agents and human partners that might exhibit behaviors that the robot has not seen before. Additionally, we observe emergent behaviors during collaborative task execution, such as the robot yielding space when obstructing a humanoid agent, thereby allowing the effective completion of the task by the humanoid agent. Furthermore, our experiments using the human-in-the-loop tool demonstrate that our automated evaluation with humanoids can provide an indication of the relative ordering of different policies when evaluated with real human collaborators. Habitat 3.0 unlocks interesting new features in simulators for Embodied AI, and we hope it paves the way for a new frontier of embodied human-AI interaction capabilities. For more details and visualizations, visit: https://aihabitat.org/habitat3.

ICLR Conference 2023 Conference Paper

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

  • Kuo-Hao Zeng
  • Luca Weihs
  • Roozbeh Mottaghi
  • Ali Farhadi

A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the ``move ahead'' action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise. This assumption is limiting; an agent may encounter settings that dramatically alter the impact of actions: a move ahead action on a wet floor may send the agent twice as far as it expects and using the same action with a broken wheel might transform the expected translation into a rotation. Instead of relying that the impact of an action stably reflects its pre-defined semantic meaning, we propose to model the impact of actions on-the-fly using latent embeddings. By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy (AAP). We evaluate our AAP on two challenging visual navigation tasks in the AI2-THOR and Habitat environments and show that our AAP is highly performant even when faced, at inference-time, with missing actions and, previously unseen, perturbed action spaces. Moreover, we observe significant improvement in robustness against these actions when evaluating in real-world scenarios.

NeurIPS Conference 2023 Conference Paper

Neural Priming for Sample-Efficient Adaptation

  • Matthew Wallingford
  • Vivek Ramanujan
  • Alex Fang
  • Aditya Kusupati
  • Roozbeh Mottaghi
  • Aniruddha Kembhavi
  • Ludwig Schmidt
  • Ali Farhadi

We propose Neural Priming, a technique for adapting large pretrained models to distribution shifts and downstream tasks given few or no labeled examples. Presented with class names or unlabeled test samples, Neural Priming enables the model to recall and conditions its parameters on relevant data seen throughout pretraining, thereby priming it for the test distribution. Neural Priming can be performed at test time in even for pretraining datasets as large as LAION-2B. Performing lightweight updates on the recalled data significantly improves accuracy across a variety of distribution shift and transfer learning benchmarks. Concretely, in the zero-shot setting, we see a 2. 45% improvement in accuracy on ImageNet and 3. 81% accuracy improvement on average across standard transfer learning benchmarks. Further, using our test time inference scheme, we see a 1. 41% accuracy improvement on ImageNetV2. These results demonstrate the effectiveness of Neural Priming in addressing the common challenge of limited labeled data and changing distributions. Code and models are open-sourced at https: //www. github. com/RAIVNLab/neural-priming.

ICLR Conference 2023 Conference Paper

Neural Radiance Field Codebooks

  • Matthew Wallingford
  • Aditya Kusupati
  • Alex Fang
  • Vivek Ramanujan
  • Aniruddha Kembhavi
  • Roozbeh Mottaghi
  • Ali Farhadi

Compositional representations of the world are a promising step towards enabling high-level scene understanding and efficient transfer to downstream tasks. Learning such representations for complex scenes and tasks remains an open challenge. Towards this goal, we introduce Neural Radiance Field Codebooks (NRC), a scalable method for learning object-centric representations through novel view reconstruction. NRC learns to reconstruct scenes from novel views using a dictionary of object codes which are decoded through a volumetric renderer. This enables the discovery of reoccurring visual and geometric patterns across scenes which are transferable to downstream tasks. We show that NRC representations transfer well to object navigation in THOR, outperforming 2D and 3D representation learning methods by 3.1\% success rate. We demonstrate that our approach is able to perform unsupervised segmentation for more complex synthetic (THOR) and real scenes (NYU Depth) better than prior methods (.101 ARI). Finally, we show that NRC improves on the task of depth ordering by 5.5% accuracy in THOR.

ICLR Conference 2023 Conference Paper

UNIFIED-IO: A Unified Model for Vision, Language, and Multi-modal Tasks

  • Jiasen Lu
  • Christopher Clark
  • Rowan Zellers
  • Roozbeh Mottaghi
  • Aniruddha Kembhavi

We propose Unified-IO, a model that performs a large variety of AI tasks spanning classical computer vision tasks, including pose estimation, object detection, depth estimation and image generation, vision-and-language tasks such as region captioning and referring expression, to natural language processing tasks such as question answering and paraphrasing. Developing a single unified model for such a large variety of tasks poses unique challenges due to the heterogeneous inputs and outputs pertaining to each task, including RGB images, per-pixel maps, binary masks, bounding boxes, and language. We achieve this unification by homogenizing every supported input and output into a sequence of discrete vocabulary tokens. This common representation across all tasks allows us to train a single transformer-based architecture, jointly on over 90 diverse datasets in the vision and language fields. Unified-IO is the first model capable of performing all 7 tasks on the GRIT benchmark and produces strong results across 16 diverse benchmarks like NYUv2-Depth, ImageNet, VQA2.0, OK-VQA, Swig, VizWizGround, BoolQ, and SciTail, with no task-specific fine-tuning. Code and pre-trained models will be made publicly available.

NeurIPS Conference 2022 Conference Paper

Ask4Help: Learning to Leverage an Expert for Embodied Tasks

  • Kunal Pratap Singh
  • Luca Weihs
  • Alvaro Herrasti
  • Jonghyun Choi
  • Aniruddha Kembhavi
  • Roozbeh Mottaghi

Embodied AI agents continue to become more capable every year with the advent of new models, environments, and benchmarks, but are still far away from being performant and reliable enough to be deployed in real, user-facing, applications. In this paper, we ask: can we bridge this gap by enabling agents to ask for assistance from an expert such as a human being? To this end, we propose the Ask4Help policy that augments agents with the ability to request, and then use expert assistance. Ask4Help policies can be efficiently trained without modifying the original agent's parameters and learn a desirable trade-off between task performance and the amount of requested help, thereby reducing the cost of querying the expert. We evaluate Ask4Help on two different tasks -- object goal navigation and room rearrangement and see substantial improvements in performance using minimal help. On object navigation, an agent that achieves a $52\%$ success rate is raised to $86\%$ with $13\%$ help and for rearrangement, the state-of-the-art model with a $7\%$ success rate is dramatically improved to $90. 4\%$ using $39\%$ help. Human trials with Ask4Help demonstrate the efficacy of our approach in practical scenarios.

TMLR Journal 2022 Journal Article

Benchmarking Progress to Infant-Level Physical Reasoning in AI

  • Luca Weihs
  • Amanda Yuile
  • Renée Baillargeon
  • Cynthia Fisher
  • Gary Marcus
  • Roozbeh Mottaghi
  • Aniruddha Kembhavi

To what extent do modern AI systems comprehend the physical world? We introduce the open-access Infant-Level Physical Reasoning Benchmark (InfLevel) to gain insight into this question. We evaluate ten neural-network architectures developed for video understanding on tasks designed to test these models' ability to reason about three essential physical principles which researchers have shown to guide human infants' physical understanding. We explore the sensitivity of each AI system to the continuity of objects as they travel through space and time, to the solidity of objects, and to gravity. We find strikingly consistent results across 60 experiments with multiple systems, training regimes, and evaluation metrics: current popular visual-understanding systems are at or near chance on all three principles of physical reasoning. We close by suggesting some potential ways forward.

AAAI Conference 2022 Conference Paper

Multi-Modal Answer Validation for Knowledge-Based VQA

  • Jialin Wu
  • Jiasen Lu
  • Ashish Sabharwal
  • Roozbeh Mottaghi

The problem of knowledge-based visual question answering involves answering questions that require external knowledge in addition to the content of the image. Such knowledge typically comes in various forms, including visual, textual, and commonsense knowledge. Using more knowledge sources increases the chance of retrieving more irrelevant or noisy facts, making it challenging to comprehend the facts and find the answer. To address this challenge, we propose Multi-modal Answer Validation using External knowledge (MAVEx), where the idea is to validate a set of promising answer candidates based on answer-specific knowledge retrieval. Instead of searching for the answer in a vast collection of often irrelevant facts as most existing approaches do, MAVEx aims to learn how to extract relevant knowledge from noisy sources, which knowledge source to trust for each answer candidate, and how to validate the candidate using that source. Our multi-modal setting is the first to leverage external visual knowledge (images searched using Google), in addition to textual knowledge in the form of Wikipedia sentences and ConceptNet concepts. Our experiments with OK-VQA, a challenging knowledge-based VQA dataset, demonstrate that MAVEx achieves new state-of-the-art results. Our code is available at https: //github. com/jialinwu17/MAVEX

NeurIPS Conference 2022 Conference Paper

🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation

  • Matt Deitke
  • Eli VanderBilt
  • Alvaro Herrasti
  • Luca Weihs
  • Kiana Ehsani
  • Jordi Salvador
  • Winson Han
  • Eric Kolve

Massive datasets and high-capacity models have driven many recent advancements in computer vision and natural language understanding. This work presents a platform to enable similar success stories in Embodied AI. We propose ProcTHOR, a framework for procedural generation of Embodied AI environments. ProcTHOR enables us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments to train and evaluate embodied agents across navigation, interaction, and manipulation tasks. We demonstrate the power and potential of ProcTHOR via a sample of 10, 000 generated houses and a simple neural model. Models trained using only RGB images on ProcTHOR, with no explicit mapping and no human task supervision produce state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation, including the presently running Habitat 2022, AI2-THOR Rearrangement 2022, and RoboTHOR challenges. We also demonstrate strong 0-shot results on these benchmarks, via pre-training on ProcTHOR with no fine-tuning on the downstream benchmark, often beating previous state-of-the-art systems that access the downstream training data.

NeurIPS Conference 2021 Conference Paper

Container: Context Aggregation Networks

  • Peng Gao
  • Jiasen Lu
  • Hongsheng Li
  • Roozbeh Mottaghi
  • Aniruddha Kembhavi

Convolutional neural networks (CNNs) are ubiquitous in computer vision, with a myriad of effective and efficient variations. Recently, Transformers -- originally introduced in natural language processing -- have been increasingly adopted in computer vision. While early adopters continued to employ CNN backbones, the latest networks are end-to-end CNN-free Transformer solutions. A recent surprising finding now shows that a simple MLP based solution without any traditional convolutional or Transformer components can produce effective visual representations. While CNNs, Transformers and MLP-Mixers may be considered as completely disparate architectures, we provide a unified view showing that they are in fact special cases of a more general method to aggregate spatial context in a neural network stack. We present the \model (CONText AggregatIon NEtwoRk), a general-purpose building block for multi-head context aggregation that can exploit long-range interactions \emph{a la} Transformers while still exploiting the inductive bias of the local convolution operation leading to faster convergence speeds, often seen in CNNs. Our \model architecture achieves 82. 7 \% Top-1 accuracy on ImageNet using 22M parameters, +2. 8 improvement compared with DeiT-Small, and can converge to 79. 9 \% Top-1 accuracy in just 200 epochs. In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named \modellight, can be employed in object detection and instance segmentation networks such as DETR, RetinaNet and Mask-RCNN to obtain an impressive detection mAP of 38. 9, 43. 8, 45. 1 and mask mAP of 41. 3, providing large improvements of 6. 6, 7. 3, 6. 9 and 6. 6 pts respectively, compared to a ResNet-50 backbone with a comparable compute and parameter size. Our method also achieves promising results on self-supervised learning compared to DeiT on the DINO framework. Code is released at https: //github. com/allenai/container.

ICLR Conference 2021 Conference Paper

Learning Generalizable Visual Representations via Interactive Gameplay

  • Luca Weihs
  • Aniruddha Kembhavi
  • Kiana Ehsani
  • Sarah M. Pratt
  • Winson Han
  • Alvaro Herrasti
  • Eric Kolve
  • Dustin Schwenk

A growing body of research suggests that embodied gameplay, prevalent not just in human cultures but across a variety of animal species including turtles and ravens, is critical in developing the neural flexibility for creative problem solving, decision making, and socialization. Comparatively little is known regarding the impact of embodied gameplay upon artificial agents. While recent work has produced agents proficient in abstract games, these environments are far removed the real world and thus these agents can provide little insight into the advantages of embodied play. Hiding games, such as hide-and-seek, played universally, provide a rich ground for studying the impact of embodied gameplay on representation learning in the context of perspective taking, secret keeping, and false belief understanding. Here we are the first to show that embodied adversarial reinforcement learning agents playing Cache, a variant of hide-and-seek, in a high fidelity, interactive, environment, learn generalizable representations of their observations encoding information such as object permanence, free space, and containment. Moving closer to biologically motivated learning strategies, our agents' representations, enhanced by intentionality and memory, are developed through interaction and play. These results serve as a model for studying how facets of vision develop through interaction, provide an experimental framework for assessing what is learned by artificial agents, and demonstrates the value of moving from large, static, datasets towards experiential, interactive, representation learning.

ICLR Conference 2021 Conference Paper

What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions

  • Kiana Ehsani
  • Daniel Gordon
  • Thomas Hai Dang Nguyen
  • Roozbeh Mottaghi
  • Ali Farhadi

Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we explore a novel approach, where we use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. For this study, we collect a dataset of human interactions capturing body part movements and gaze in their daily lives. Our experiments show that our ``"muscly-supervised" representation that encodes interaction and attention cues outperforms a visual-only state-of-the-art method MoCo (He et al.,2020), on a variety of target tasks: scene classification (semantic), action recognition (temporal), depth estimation (geometric), dynamics prediction (physics) and walkable surface estimation (affordance). Our code and dataset are available at: https://github.com/ehsanik/muscleTorch.

NeurIPS Conference 2020 Conference Paper

Learning About Objects by Learning to Interact with Them

  • Martin Lohmann
  • Jordi Salvador
  • Aniruddha Kembhavi
  • Roozbeh Mottaghi

Much of the remarkable progress in computer vision has been focused around fully supervised learning mechanisms relying on highly curated datasets for a variety of tasks. In contrast, humans often learn about their world with little to no external supervision. Taking inspiration from infants learning from their environment through play and interaction, we present a computational framework to discover objects and learn their physical properties along this paradigm of Learning from Interaction. Our agent, when placed within the near photo-realistic and physics-enabled AI2-THOR environment, interacts with its world and learns about objects, their geometric extents and relative masses, without any external guidance. Our experiments reveal that this agent learns efficiently and effectively; not just for objects it has interacted with before, but also for novel instances from seen categories as well as novel object categories.

ICRA Conference 2017 Conference Paper

Target-driven visual navigation in indoor scenes using deep reinforcement learning

  • Yuke Zhu
  • Roozbeh Mottaghi
  • Eric Kolve
  • Joseph J. Lim
  • Abhinav Gupta 0001
  • Li Fei-Fei 0001
  • Ali Farhadi

Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i. e. , the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization. To address the second issue, we propose the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment.

JMLR Journal 2016 Journal Article

Complexity of Representation and Inference in Compositional Models with Part Sharing

  • Alan Yuille
  • Roozbeh Mottaghi

This paper performs a complexity analysis of a class of serial and parallel compositional models of multiple objects and shows that they enable efficient representation and rapid inference. Compositional models are generative and represent objects in a hierarchically distributed manner in terms of parts and subparts, which are constructed recursively by part-subpart compositions. Parts are represented more coarsely at higher level of the hierarchy, so that the upper levels give coarse summary descriptions (e.g., there is a horse in the image) while the lower levels represents the details (e.g., the positions of the legs of the horse). This hierarchically distributed representation obeys the executive summary principle, meaning that a high level executive only requires a coarse summary description and can, if necessary, get more details by consulting lower level executives. The parts and subparts are organized in terms of hierarchical dictionaries which enables part sharing between different objects allowing efficient representation of many objects. The first main contribution of this paper is to show that compositional models can be mapped onto a parallel visual architecture similar to that used by bio- inspired visual models such as deep convolutional networks but more explicit in terms of representation, hence enabling part detection as well as object detection, and suitable for complexity analysis. Inference algorithms can be run on this architecture to exploit the gains caused by part sharing and executive summary. Effectively, this compositional architecture enables us to perform exact inference simultaneously over a large class of generative models of objects. The second contribution is an analysis of the complexity of compositional models in terms of computation time (for serial computers) and numbers of nodes (e.g., "neurons") for parallel computers. In particular, we compute the complexity gains by part sharing and executive summary and their dependence on how the dictionary scales with the level of the hierarchy. We explore three regimes of scaling behavior where the dictionary size (i) increases exponentially with the level of the hierarchy, (ii) is determined by an unsupervised compositional learning algorithm applied to real data, (iii) decreases exponentially with scale. This analysis shows that in some regimes the use of shared parts enables algorithms which can perform inference in time linear in the number of levels for an exponential number of objects. In other regimes part sharing has little advantage for serial computers but can enable linear processing on parallel computers. [abs] [ pdf ][ bib ] &copy JMLR 2016. ( edit, beta )

ICLR Conference 2013 Conference Paper

Complexity of Representation and Inference in Compositional Models with Part Sharing

  • Alan L. Yuille
  • Roozbeh Mottaghi

This paper describes serial and parallel compositional models of multiple objects with part sharing. Objects are built by part-subpart compositions and expressed in terms of a hierarchical dictionary of object parts. These parts are represented on lattices of decreasing sizes which yield an executive summary description. We describe inference and learning algorithms for these models. We analyze the complexity of this model in terms of computation time (for serial computers) and numbers of nodes (e.g., 'neurons') for parallel computers. In particular, we compute the complexity gains by part sharing and its dependence on how the dictionary scales with the level of the hierarchy. We explore three regimes of scaling behavior where the dictionary size (i) increases exponentially with the level, (ii) is determined by an unsupervised compositional learning algorithm applied to real data, (iii) decreases exponentially with scale. This analysis shows that in some regimes the use of shared parts enables algorithms which can perform inference in time linear in the number of levels for an exponential number of objects. In other regimes part sharing has little advantage for serial computers but can give linear processing on parallel computers.

ICRA Conference 2009 Conference Paper

Graph-based planning using local information for unknown outdoor environments

  • Jinhan Lee
  • Roozbeh Mottaghi
  • Charles Pippin
  • Tucker R. Balch

One of the common applications for outdoor robots is to follow a path in large scale unknown environments. This task is challenging due to the intensive memory requirements to represent the map, uncertainties in the location estimate of the robot and unknown terrain type and obstacles on the way to the goal. We develop a novel graph-based path planner that is based on only local perceptual information to plan a path in such environments. In order to extend the capabilities of the graph representation, we introduce Exploration Bias, which is a node attribute that can implicitly encode obstacle features at immediate surrounding of a node in the graph, the uncertainty of the planner about a node location and also the frequency of visiting a location. Through simulation experiments, we demonstrate that the resulting path cost and distance that the robot traverses to reach the goal location is not significantly different from those of the previous approaches.

ICRA Conference 2008 Conference Paper

Place recognition-based fixed-lag smoothing for environments with unreliable GPS

  • Roozbeh Mottaghi
  • Michael Kaess
  • Ananth Ranganathan
  • Richard Roberts 0001
  • Frank Dellaert

Pose estimation of outdoor robots presents some distinct challenges due to the various uncertainties in the robot sensing and action. In particular, global positioning sensors of outdoor robots do not always work perfectly, causing large drift in the location estimate of the robot. To overcome this common problem, we propose a new approach for global localization using place recognition. First, we learn the location of some arbitrary key places using odometry measurements and GPS measurements only at the start and the end of the robot trajectory. In subsequent runs, when the robot perceives a key place, our fixed-lag smoother fuses odometry measurements with the relative location to the key place to improve its pose estimate. Outdoor mobile robot experiments show that place recognition measurements significantly improve the estimate of the smoother in the absence of GPS measurements.

ICRA Conference 2006 Conference Paper

An Integrated Particle Filter and Potential Field Method for Cooperative Robot Target Tracking

  • Roozbeh Mottaghi
  • Richard Vaughan 0001

A fundamental challenge for robotic target-tracking systems is to cope with cases in which the target is not seen for long periods of time. An additional challenge in multiple-robot systems is to coordinate robot activity to best track targets with limited visibility. We describe a novel technique that combines a particle filter target model with a potential field robot controller. Robots are attracted to points sampled from the particle cloud that models the probability distribution over the target's position, subject to environmental constraints. We show how this method can be used as a coordination strategy whereby a team of robots cooperatively minimize the uncertainty in the pose of a tracked target. Simulation results are presented