Arrow Research search

Author name cluster

Yuqing Du

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

ICML Conference 2024 Conference Paper

Learning to Model the World With Language

  • Jessy Lin
  • Yuqing Du
  • Olivia Watkins
  • Danijar Hafner
  • Pieter Abbeel
  • Dan Klein 0001
  • Anca D. Dragan

To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language—language like "this button turns on the TV" or "I put the bowls away"—that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future: what they will observe, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations, and learns to act from imagined model rollouts. While current methods that learn language-conditioned policies degrade in performance with more diverse types of language, we show that Dynalang learns to leverage environment descriptions, game rules, and instructions to excel on tasks ranging from game-playing to navigating photorealistic home scans. Finally, we show that our method enables additional capabilities due to learning a generative model: Dynalang can be pretrained on text-only data, enabling learning from offline datasets, and generate language grounded in an environment.

RLC Conference 2024 Conference Paper

Semi-Supervised One Shot Imitation Learning

  • Philipp Wu
  • Kourosh Hakhamaneshi
  • Yuqing Du
  • Igor Mordatch
  • Aravind Rajeswaran
  • Pieter Abbeel

One-shot Imitation Learning (OSIL) aims to imbue AI agents with the ability to learn a new task from a single demonstration. To supervise the learning, OSIL requires a prohibitively large number of paired expert demonstrations: trajectories corresponding to different variations of the same semantic task. To overcome this limitation, we introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of tasks with only one demonstration each (unpaired dataset), along with a small dataset of tasks with multiple demonstrations (paired dataset). This presents a more realistic and practical embodiment of few-shot learning and requires the agent to effectively leverage weak supervision. Subsequently, we develop an algorithm applicable to this semi-supervised OSIL setting. Our approach first learns an embedding space where different tasks cluster uniquely. We utilize this embedding space and the clustering it supports to self-generate pairings between trajectories in the large unpaired dataset. Through empirical results, we demonstrate that OSIL models trained on such self-generated pairings (labels) are competitive with OSIL models trained with ground-truth labels, presenting a major advancement in the label-efficiency of OSIL.

RLJ Journal 2024 Journal Article

Semi-Supervised One Shot Imitation Learning

  • Philipp Wu
  • Kourosh Hakhamaneshi
  • Yuqing Du
  • Igor Mordatch
  • Aravind Rajeswaran
  • Pieter Abbeel

One-shot Imitation Learning (OSIL) aims to imbue AI agents with the ability to learn a new task from a single demonstration. To supervise the learning, OSIL requires a prohibitively large number of paired expert demonstrations: trajectories corresponding to different variations of the same semantic task. To overcome this limitation, we introduce the semi-supervised OSIL problem setting, where the learning agent is presented with a large dataset of tasks with only one demonstration each (unpaired dataset), along with a small dataset of tasks with multiple demonstrations (paired dataset). This presents a more realistic and practical embodiment of few-shot learning and requires the agent to effectively leverage weak supervision. Subsequently, we develop an algorithm applicable to this semi-supervised OSIL setting. Our approach first learns an embedding space where different tasks cluster uniquely. We utilize this embedding space and the clustering it supports to self-generate pairings between trajectories in the large unpaired dataset. Through empirical results, we demonstrate that OSIL models trained on such self-generated pairings (labels) are competitive with OSIL models trained with ground-truth labels, presenting a major advancement in the label-efficiency of OSIL.

NeurIPS Conference 2023 Conference Paper

DPOK: Reinforcement Learning for Fine-tuning Text-to-Image Diffusion Models

  • Ying Fan
  • Olivia Watkins
  • Yuqing Du
  • Hao Liu
  • Moonkyung Ryu
  • Craig Boutilier
  • Pieter Abbeel
  • Mohammad Ghavamzadeh

Learning from human feedback has been shown to improve text-to-image models. These techniques first learn a reward function that captures what humans care about in the task and then improve the models based on the learned reward function. Even though relatively simple approaches (e. g. , rejection sampling based on reward scores) have been investigated, fine-tuning text-to-image models with the reward function remains challenging. In this work, we propose using online reinforcement learning (RL) to fine-tune text-to-image models. We focus on diffusion models, defining the fine-tuning task as an RL problem, and updating the pre-trained text-to-image diffusion models using policy gradient to maximize the feedback-trained reward. Our approach, coined DPOK, integrates policy optimization with KL regularization. We conduct an analysis of KL regularization for both RL fine-tuning and supervised fine-tuning. In our experiments, we show that DPOK is generally superior to supervised fine-tuning with respect to both image-text alignment and image quality. Our code is available at https: //github. com/google-research/google-research/tree/master/dpok.

ICML Conference 2023 Conference Paper

Guiding Pretraining in Reinforcement Learning with Large Language Models

  • Yuqing Du
  • Olivia Watkins
  • Zihan Wang
  • Cédric Colas
  • Trevor Darrell
  • Pieter Abbeel
  • Abhishek Gupta 0004
  • Jacob Andreas

Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent’s current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks.

ICRA Conference 2023 Conference Paper

Practical Visual Deep Imitation Learning via Task-Level Domain Consistency

  • Mohi Khansari
  • Daniel Ho
  • Yuqing Du
  • Armando Fuentes
  • Matthew Bennice
  • Nicolas Sievers
  • Sean Kirmani
  • Yunfei Bai

Recent work in visual end-to-end learning for robotics has shown the promise of imitation learning across a variety of tasks. Such approaches are however expensive both because they require large amounts of real world data and rely on time-consuming real-world evaluations to identify the best model for deployment. These challenges can be mitigated by using simulation evaluations to identify high performing policies. However, this introduces the well-known “reality gap” problem, where simulator inaccuracies decorrelate performance in simulation from that of reality. In this paper, we build on top of prior work in GAN-based domain adaptation and introduce the notion of a Task Consistency Loss (TCL), a self-supervised loss that encourages sim and real alignment both at the feature and action-prediction levels. We demonstrate the effectiveness of our approach by teaching a 9-DoF mobile manipulator to perform the challenging task of latched door opening purely from visual inputs such as RGB and depth images. We achieve 69% success across twenty seen and unseen meeting rooms using only ~ 16. 2 hours of teleoperated demonstrations in sim and real. To the best of our knowledge, this is the first work to tackle latched door opening from a purely end-to-end learning approach, where the task of navigation and manipulation are jointly modeled by a single neural network.

ICML Conference 2022 Conference Paper

Bayesian Imitation Learning for End-to-End Mobile Manipulation

  • Yuqing Du
  • Daniel Ho
  • Alexander A. Alemi
  • Eric Jang
  • Mohi Khansari

In this work we investigate and demonstrate benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator. Augmenting policies with additional sensor inputs{—}such as RGB + depth cameras{—}is a straightforward approach to improving robot perception capabilities, especially for tasks that may favor different sensors in different situations. As we scale multi-sensor robotic learning to unstructured real-world settings (e. g. offices, homes) and more complex robot behaviors, we also increase reliance on simulators for cost, efficiency, and safety. Consequently, the sim-to-real gap across multiple sensor modalities also increases, making simulated validation more difficult. We show that using the Variational Information Bottleneck (Alemi et al. , 2016) to regularize convolutional neural networks improves generalization to heldout domains and reduces the sim-to-real gap in a sensor-agnostic manner. As a side effect, the learned embeddings also provide useful estimates of model uncertainty for each sensor. We demonstrate that our method is able to help close the sim-to-real gap and successfully fuse RGB and depth modalities based on understanding of the situational uncertainty of each sensor. In a real-world office environment, we achieve 96% task success, improving upon the baseline by +16%.

ICLR Conference 2022 Conference Paper

It Takes Four to Tango: Multiagent Self Play for Automatic Curriculum Generation

  • Yuqing Du
  • Pieter Abbeel
  • Aditya Grover

We are interested in training general-purpose reinforcement learning agents that can solve a wide variety of goals. Training such agents efficiently requires automatic generation of a goal curriculum. This is challenging as it requires (a) exploring goals of increasing difficulty, while ensuring that the agent (b) is exposed to a diverse set of goals in a sample efficient manner and (c) does not catastrophically forget previously solved goals. We propose Curriculum Self Play (CuSP), an automated goal generation framework that seeks to satisfy these desiderata by virtue of a multi-player game with 4 agents. We extend the asymmetric curricula learning in PAIRED (Dennis et al., 2020) to a symmetrized game that carefully balances cooperation and competition between two off-policy student learners and two regret-maximizing teachers. CuSP additionally introduces entropic goal coverage and accounts for the non-stationary nature of the students, allowing us to automatically induce a curriculum that balances progressive exploration with anti-catastrophic exploitation. We demonstrate that our method succeeds at generating an effective curricula of goals for a range of control tasks, outperforming other methods at zero-shot test-time generalization to novel out-of-distribution goals.

ICRA Conference 2021 Conference Paper

Auto-Tuned Sim-to-Real Transfer

  • Yuqing Du
  • Olivia Watkins
  • Trevor Darrell
  • Pieter Abbeel
  • Deepak Pathak

Policies trained in simulation often fail when transferred to the real world due to the ‘reality gap’ where the simulator is unable to accurately capture the dynamics and visual properties of the real world. Current approaches to tackle this problem, such as domain randomization, require prior knowledge and engineering to determine how much to randomize system parameters in order to learn a policy that is robust to sim-to-real transfer while also not being too conservative. We propose a method for automatically tuning simulator system parameters to match the real world using only raw RGB images of the real world without the need to define rewards or estimate state. Our key insight is to reframe the auto-tuning of parameters as a search problem where we iteratively shift the simulation system parameters to approach the real world system parameters. We propose a Search Param Model (SPM) that, given a sequence of observations and actions and a set of system parameters, predicts whether the given parameters are higher or lower than the true parameters used to generate the observations. We evaluate our method on multiple robotic control tasks in both sim-to-sim and sim-to-real transfer, demonstrating significant improvement over naive domain randomization. Project videos at https://yuqingd.github.io/autotuned-sim2real/.

NeurIPS Conference 2020 Conference Paper

AvE: Assistance via Empowerment

  • Yuqing Du
  • Stas Tiomkin
  • Emre Kiciman
  • Daniel Polani
  • Pieter Abbeel
  • Anca Dragan

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s). Existing methods tend to rely on inferring the human's goal, which is challenging when there are many potential goals or when the set of candidate goals is difficult to identify. We propose a new paradigm for assistance by instead increasing the human's ability to control their environment, and formalize this approach by augmenting reinforcement learning with human empowerment. This task-agnostic objective increases the person's autonomy and ability to achieve any eventual state. We test our approach against assistance based on goal inference, highlighting scenarios where our method overcomes failure modes stemming from goal ambiguity or misspecification. As existing methods for estimating empowerment in continuous domains are computationally hard, precluding its use in real time learned assistance, we also propose an efficient empowerment-inspired proxy metric. Using this, we are able to successfully demonstrate our method in a shared autonomy user study for a challenging simulated teleoperation task with human-in-the-loop training.

ICRA Conference 2019 Conference Paper

Group Surfing: A Pedestrian-Based Approach to Sidewalk Robot Navigation

  • Yuqing Du
  • Nicholas J. Hetherington
  • Chu Lip Oon
  • Wesley P. Chan
  • Camilo Perez Quintero
  • Elizabeth A. Croft
  • H. F. Machiel Van der Loos

In this paper, we propose a novel navigation system for mobile robots in pedestrian-rich sidewalk environments. Sidewalks are unique in that the pedestrian-shared space has characteristics of both roads and indoor spaces. Like vehicles on roads, pedestrian movement often manifests as linear flows in opposing directions. On the other hand, pedestrians also form crowds and can exhibit much more random movements than vehicles. Classical algorithms are insufficient for safe navigation around pedestrians and remaining on the sidewalk space. Thus, our approach takes advantage of natural human motion to allow a robot to adapt to sidewalk navigation in a safe and socially-compliant manner. We developed a group surfing method which aims to imitate the optimal pedestrian group for bringing the robot closer to its goal. For pedestrian-sparse environments, we propose a sidewalk edge detection and following method. Underlying these two navigation methods, the collision avoidance scheme is human-aware. The integrated navigation stack is evaluated and demonstrated in simulation. A hardware demonstration is also presented.