Author name cluster

Jiaheng Hu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

AAAI Conference 2026 Conference Paper

Multi-Step Deformable Gaussian Splatting for Dynamic Scene Rendering

Jiaheng Hu
Zhizhong Zhang
Jingyu Gong
Lizhuang Ma
Xin Tan
Yuan Xie

Reconstructing dynamic scenes has long been a challenging task in 3D vision. Previous mainstream methods based on 3D Gaussian Splatting typically employ a single deformation field to directly model spatiotemporal changes. However, such one-step deformation struggles to capture diverse and complex motion patterns. To address this limitation, we propose decomposing the one-step deformation into a multi-step process, where each step is represented by a deformation layer. Additionally, we introduce a weight prediction mechanism for each layer to control the extent of deformation at every step. We provide two types of deformation layers based on implicit and explicit approaches. Moreover, while the deformation layer is time-conditioned, the Gaussians' behavior may still be influenced by their time-invariant properties. Therefore, we propose a fully time-agnostic scale modulation block to modulate the scaling changes of Gaussians. Extensive experiments on D-NeRF, Neu3D, and HyperNeRF demonstrate that our method achieves state-of-the-art performance.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes

Chen Tang
Ben Abbatematteo
Jiaheng Hu
Rohan Chandra
Roberto Martín-Martín
Peter Stone

Reinforcement learning (RL), particularly its combination with deep neural networks referred to as deep RL (DRL), has shown tremendous promise across a wide range of applications, suggesting its potential for enabling the development of sophisticated robotic behaviors. Robotics problems, however, pose fundamental difficulties for the application of RL, stemming from the complexity and cost of interacting with the physical world. These challenges notwithstanding, recent advances have enabled DRL to succeed at some real-world robotic tasks. However, state-of-the-art DRL solutions’ maturity varies significantly across robotic applications. In this talk, I will review the current progress of DRL in real-world robotic applications based on our recent survey paper (with Tang, Abbatematteo, Hu, Chandra, and Martı́n-Martı́n), with a particular focus on evaluating the real-world successes achieved with DRL in realizing several key robotic competencies, including locomotion, navigation, stationary manipulation, mobile manipulation, human-robot interaction, and multi-robot interaction. The analysis aims to identify the key factors underlying those exciting successes, reveal underexplored areas, and provide an overall characterization of the status of DRL in robotics. I will also highlight several important avenues for future work, emphasizing the need for stable and sample-efficient real-world RL paradigms, holistic approaches for discovering and integrating various competencies to tackle complex long-horizon, open-world tasks, and principled development and evaluation procedures. The talk is designed to offer insights for RL practitioners and roboticists toward harnessing RL’s power to create generally capable real-world robotic systems.

PDF Details DOI

ICRA Conference 2025 Conference Paper

FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

Jiaheng Hu
Rose Hendrix
Ali Farhadi
Aniruddha Kembhavi
Roberto Martín-Martín
Peter Stone 0001
Kuo-Hao Zeng
Kiana Ehsani

In recent years, the Robotics field has initiated several efforts toward building generalist robot policies through large-scale multi-task Behavior Cloning. However, direct deployments of these policies have led to unsatisfactory performance, where the policy struggles with unseen states and tasks. How can we break through the performance plateau of these models and elevate their capabilities to new heights? In this paper, we propose FLaRe, a large-scale Reinforcement Learning fine-tuning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques. Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance both on previously demonstrated and on entirely novel tasks and embodiments. Specifically, on a set of long-horizon mobile manipulation tasks, FLaRe achieves an average success rate of 79. 5% in unseen environments, with absolute improvements of $+23. 6 \%$ in simulation and $+30. 7 \%$ on real robots over prior SoTA methods. By utilizing only sparse rewards, our approach can enable generalizing to new capabilities beyond the pretraining data with minimal human effort. Moreover, we demonstrate rapid adaptation to new embodiments and behaviors with less than a day of fine-tuning. Videos, code, and appendix can be found on the project website at robot-flare.github.io

Details

IROS Conference 2025 Conference Paper

L3M+P: Lifelong Planning with Large Language Models

Krish Agarwal
Yuqian Jiang
Jiaheng Hu
Bo Liu 0042
Peter Stone 0001

By combining classical planning methods with large language models (LLMs), recent research such as LLM+P has enabled agents to plan for general tasks given in natural language. However, scaling these methods to general-purpose service robots remains challenging: (1) classical planning algorithms generally require a detailed and consistent specification of the environment, which is not always readily available; and (2) existing frameworks mainly focus on isolated planning tasks, whereas robots are often meant to serve in long-term continuous deployments, and therefore must maintain a dynamic memory of the environment which can be updated with multi-modal inputs and extracted as planning knowledge for future tasks. To address these two issues, this paper introduces L3M+P (Lifelong LLM+P), a framework that uses an external knowledge graph as a representation of the world state. The graph can be updated from multiple sources of information, including sensory input and natural language interactions with humans. L3M+P enforces rules for the expected format of the absolute world state graph to maintain consistency between graph updates. At planning time, given a natural language description of a task, L3M+P retrieves context from the knowledge graph and generates a problem definition for classical planners. Evaluated on household robot simulators and on a real-world service robot, L3M+P achieves significant improvement over baseline methods both on accurately registering natural language state changes and on correctly generating plans, thanks to the knowledge graph retrieval and verification.

Details

NeurIPS Conference 2024 Conference Paper

Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Jiaheng Hu
Zizhao Wang
Peter Stone
Roberto Martín-Martín

A hallmark of intelligent agents is the ability to learn reusable skills purely from unsupervised interaction with the environment. However, existing unsupervised skill discovery methods often learn entangled skills where one skill variable simultaneously influences many entities in the environment, making downstream skill chaining extremely challenging. We propose Disentangled Unsupervised Skill Discovery (DUSDi), a method for learning disentangled skills that can be efficiently reused to solve downstream tasks. DUSDi decomposes skills into disentangled components, where each skill component only affects one factor of the state space. Importantly, these skill components can be concurrently composed to generate low-level actions, and efficiently chained to tackle downstream tasks through hierarchical Reinforcement Learning. DUSDi defines a novel mutual-information-based objective to enforce disentanglement between the influences of different skill components, and utilizes value factorization to optimize this objective efficiently. Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Open X-Embodiment: Robotic Learning Datasets and RT-X Models: Open X-Embodiment Collaboration

Abby O'Neill
Abdul Rehman
Abhiram Maddukuri
Abhishek Gupta 0004
Abhishek Padalkar
Abraham Lee
Acorn Pooley
Agrim Gupta

Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train "generalist" X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. The project website is robotics-transformer-x. github.io.

Details

NeurIPS Conference 2024 Conference Paper

SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions

Zizhao Wang
Jiaheng Hu
Caleb Chuck
Stephen Chen
Roberto Martín-Martín
Amy Zhang
Scott Niekum
Peter Stone

Unsupervised skill discovery carries the promise that an intelligent agent can learn reusable skills through autonomous, reward-free interactions with environments. Existing unsupervised skill discovery methods learn skills by encouraging distinguishable behaviors that cover diverse states. However, in complex environments with many state factors (e. g. , household environments with many objects), learning skills that cover all possible states is impossible, and naively encouraging state diversity often leads to simple skills that are not ideal for solving downstream tasks. This work introduces Skill Discovery from Local Dependencies (SkiLD), which leverages state factorization as a natural inductive bias to guide the skill learning process. The key intuition guiding SkiLD is that skills that induce \textbf{diverse interactions} between state factors are often more valuable for solving downstream tasks. To this end, SkiLD develops a novel skill learning objective that explicitly encourages the mastering of skills that effectively induce different interactions within an environment. We evaluate SkiLD in several domains with challenging, long-horizon sparse reward tasks including a realistic simulated household robot domain, where SkiLD successfully learns skills with clear semantic meaning and shows superior performance compared to existing unsupervised reinforcement learning methods that only maximize state coverage.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

ELDEN: Exploration via Local Dependencies

Zizhao Wang
Jiaheng Hu
Peter Stone
Roberto Martín-Martín

Tasks with large state space and sparse rewards present a longstanding challenge to reinforcement learning. In these tasks, an agent needs to explore the state space efficiently until it finds a reward. To deal with this problem, the community has proposed to augment the reward function with intrinsic reward, a bonus signal that encourages the agent to visit interesting states. In this work, we propose a new way of defining interesting states for environments with factored state spaces and complex chained dependencies, where an agent's actions may change the value of one entity that, in order, may affect the value of another entity. Our insight is that, in these environments, interesting states for exploration are states where the agent is uncertain whether (as opposed to how) entities such as the agent or objects have some influence on each other. We present ELDEN, Exploration via Local DepENdencies, a novel intrinsic reward that encourages the discovery of new interactions between entities. ELDEN utilizes a novel scheme --- the partial derivative of the learned dynamics to model the local dependencies between entities accurately and computationally efficiently. The uncertainty of the predicted dependencies is then used as an intrinsic reward to encourage exploration toward new interactions. We evaluate the performance of ELDEN on four different domains with complex dependencies, ranging from 2D grid worlds to 3D robotic tasks. In all domains, ELDEN correctly identifies local dependencies and learns successful policies, significantly outperforming previous state-of-the-art exploration methods.

PDF Details

ICRA Conference 2022 Conference Paper

Modular Robot Design Optimization with Generative Adversarial Networks

Jiaheng Hu
Julian Whitman
Matthew J. Travers
Howie Choset

Modular robots are made up of a set of components which can be configured and reconfigured to form customized robots for a wide range of tasks. Fully utilizing the flexibility of modular robots is challenging, as it requires the identification of optimal modular designs for each given task, often with limited computation and time. Previous works in design automation achieve efficient run-times by utilizing machine learning to create a one-to-one mapping from task to design. However, the problem of robot design is often multimodal, where multiple distinct designs can be similarly or equally good for a task. Alternative design solutions may be needed in the field, for instance, if a module in the optimal design fails and no replacement is available. This paper presents a novel method based on generative adversarial networks (GANs) that learns a one-to-many mapping from task to a distribution of designs. We apply our method to construct locomoting modular robots for terrains with varying obstacle heights and infill. We compare our method against the state-of-the-art, and find that our algorithm results in better solution quality, diversity, and alternatives for when the optimal design fails.

Details

AAAI Conference 2020 Conference Paper

Communication Learning via Backpropagation in Discrete Channels with Unknown Noise

Benjamin Freed
Guillaume Sartoretti
Jiaheng Hu
Howie Choset

This work focuses on multi-agent reinforcement learning (RL) with inter-agent communication, in which communication is differentiable and optimized through backpropagation. Such differentiable approaches tend to converge more quickly to higher-quality policies compared to techniques that treat communication as actions in a traditional RL framework. However, modern communication networks (e. g. , Wi- Fi or Bluetooth) rely on discrete communication channels, for which existing differentiable approaches that consider realvalued messages cannot be directly applied, or require biased gradient estimators. Some works have overcome this problem by treating the message space as an extension of the action space, and use standard RL to optimize message selection, but these methods tend to converge slower and to inferior policies. In this paper, we propose a stochastic message encoding/decoding procedure that makes a discrete communication channel mathematically equivalent to an analog channel with additive noise, through which gradients can be backpropagated. Additionally, we introduce an encryption step for use in noisy channels that forces channel noise to be message-independent, allowing us to compute unbiased derivative estimates even in the presence of unknown channel noise. To the best of our knowledge, this work presents the ﬁrst differentiable communication learning approach that can compute unbiased derivatives through channels with unknown noise. We demonstrate the effectiveness of our approach in two example multi-robot tasks: a path ﬁnding and a collaborative search problem. There, we show that our approach achieves learning speed and performance similar to differentiable communication learning with real-valued messages (i. e. , unlimited communication bandwidth), while naturally handling more realistic real-world communication constraints. Content Areas: Multi-Agent Communication, Reinforcement Learning.

PDF Details