Author name cluster

Chenran Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

ICRA Conference 2025 Conference Paper

Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion

Boyuan Liang
Lingfeng Sun
Xinghao Zhu
Bike Zhang
Ziyin Xiong
Yixiao Wang
Chenran Li
Koushil Sreenath

In reinforcement learning for legged robot locomotion, crafting effective reward strategies is crucial. Predefined gait patterns and complex reward systems are widely used to stabilize policy training. Drawing from the natural locomotion behaviors of humans and animals, which adapt their gaits to minimize energy consumption, we investigate the impact of incorporating an energy-efficient reward term that prioritizes distance-averaged energy consumption into the reinforcement learning framework. Our findings demonstrate that this simple addition enables quadruped robots to autonomously select appropriate gaits-such as four-beat walking at lower speeds and trotting at higher speeds-without the need for explicit gait regularizations. Furthermore, we provide a guideline for tuning the weight of this energy-efficient reward, facilitating its application in real-world scenarios. The effectiveness of our approach is validated through simulations and on a real Unitree Gol robot. This research highlights the potential of energy-centric reward functions to simplify and enhance the learning of adaptive and efficient locomotion in quadruped robots. Videos and more details are at https://sites.google.com/berkeley.edu/efficient-locomotion

Details

ICLR Conference 2025 Conference Paper

Residual-MPPI: Online Policy Customization for Continuous Control

Pengcheng Wang 0004
Chenran Li
Catherine Weaver
Kenta Kawamoto
Masayoshi Tomizuka
Chen Tang 0001
Wei Zhan

Policies developed through Reinforcement Learning (RL) and Imitation Learning (IL) have shown great potential in continuous control tasks, but real-world applications often require adapting trained policies to unforeseen requirements. While fine-tuning can address such needs, it typically requires additional data and access to the original training metrics and parameters. In contrast, an online planning algorithm, if capable of meeting the additional requirements, can eliminate the necessity for extensive training phases and customize the policy without knowledge of the original training scheme or task. In this work, we propose a generic online planning algorithm for customizing continuous-control policies at the execution time, which we call Residual-MPPI. It can customize a given prior policy on new performance metrics in few-shot and even zero-shot online settings, given access to the prior action distribution alone. Through our experiments, we demonstrate that the proposed Residual-MPPI algorithm can accomplish the few-shot/zero-shot online policy customization task effectively, including customizing the champion-level racing agent, Gran Turismo Sophy (GT Sophy) 1.0, in the challenging car racing scenario, Gran Turismo Sport (GTS) environment. Code for MuJoCo experiments is included in the supplementary and will be open-sourced upon acceptance. Demo videos are available on our website: https://sites.google.com/view/residual-mppi.

Details

ICML Conference 2025 Conference Paper

WOMD-Reasoning: A Large-Scale Dataset for Interaction Reasoning in Driving

Yiheng Li
Cunxin Fan
Chongjian Ge
Seth Z. Zhao
Chenran Li
Chenfeng Xu
Huaxiu Yao
Masayoshi Tomizuka

Language models uncover unprecedented abilities in analyzing driving scenarios, owing to their limitless knowledge accumulated from text-based pre-training. Naturally, they should particularly excel in analyzing rule-based interactions, such as those triggered by traffic laws, which are well documented in texts. However, such interaction analysis remains underexplored due to the lack of dedicated language datasets that address it. Therefore, we propose Waymo Open Motion Dataset-Reasoning (WOMD-Reasoning), a comprehensive large-scale Q&As dataset built on WOMD focusing on describing and reasoning traffic rule-induced interactions in driving scenarios. WOMD-Reasoning also presents by far the largest multi-modal Q&A dataset, with 3 million Q&As on real-world driving scenarios, covering a wide range of driving topics from map descriptions and motion status descriptions to narratives and analyses of agents’ interactions, behaviors, and intentions. To showcase the applications of WOMD-Reasoning, we design Motion-LLaVA, a motion-language model fine-tuned on WOMD-Reasoning. Quantitative and qualitative evaluations are performed on WOMD-Reasoning dataset as well as the outputs of Motion-LLaVA, supporting the data quality and wide applications of WOMD-Reasoning, in interaction predictions, traffic rule compliance plannings, etc. The dataset and its vision modal extension are available on https: //waymo. com/open/download/. The codes & prompts to build it are available on https: //github. com/yhli123/WOMD-Reasoning.

Details

ICRA Conference 2025 Conference Paper

X-MOBILITY: End-to-End Generalizable Navigation via World Modeling

Wei Liu
Huihua Zhao
Chenran Li
Joydeep Biswas
Billy Okal
Pulkit Goyal
Yan Chang
Soha Pouya

General-purpose navigation in challenging environments remains a significant problem in robotics, with current state-of-the-art approaches facing myriad limitations. Classical approaches struggle with cluttered settings and require extensive tuning, while learning-based methods face difficulties generalizing to out-of-distribution environments. This paper introduces X-Mobility, an end-to-end generalizable navigation model that overcomes existing challenges by leveraging three key ideas. First, X-Mobility employs an auto-regressive world modeling architecture with a latent state space to capture world dynamics. Second, a diverse set of multi-head decoders enables the model to learn a rich state representation that correlates strongly with effective navigation skills. Third, by decoupling world modeling from action policy, our architecture can train effectively on a variety of data sources, both with and without expert policies-off-policy data allows the model to learn world dynamics, while on-policy data with supervisory control enables optimal action policy learning. Through extensive experiments, we demonstrate that X-Mobility not only generalizes effectively but also surpasses current state-of-the-art navigation approaches. Additionally, X-Mobility also achieves zero-shot Sim2Real transferability and shows strong potential for crossembodiment generalization. Project page: https://nvlabs.github.io/X-MOBILITY.

Details

ICRA Conference 2024 Conference Paper

Optimal Driver Warning Generation in Dynamic Driving Environment

Chenran Li
Aolin Xu 0002
Enna Sachdeva
Teruhisa Misu
Behzad Dariush

The driver warning system that alerts the human driver about potential risks during driving is a key feature of an advanced driver assistance system. Existing driver warning technologies, mainly the forward collision warning and unsafe lane change warning, can reduce the risk of collision caused by human errors. However, the current design methods have several major limitations. Firstly, the warnings are mainly generated in a one-shot manner without modeling the ego driver’s reactions and surrounding objects, which reduces the flexibility and generality of the system over different scenarios. Additionally, the triggering conditions of warning are mostly rule-based threshold-checking given the current state, which lacks the prediction of the potential risk in a sufficiently long future horizon. In this work, we study the problem of optimally generating driver warnings by considering the interactions among the generated warning, the driver behavior, and the states of ego and surrounding vehicles on a long horizon. The warning generation problem is formulated as a partially observed Markov decision process (POMDP). An optimal warning generation framework is proposed as a solution to the proposed POMDP. The simulation experiments demonstrate the superiority of the proposed solution to the existing warning generation methods.

Details

IROS Conference 2024 Conference Paper

Pre-training on Synthetic Driving Data for Trajectory Prediction

Yiheng Li
Seth Z. Zhao
Chenfeng Xu
Chen Tang 0001
Chenran Li
Mingyu Ding
Masayoshi Tomizuka
Wei Zhan

Accumulating substantial volumes of real-world driving data proves pivotal in the realm of trajectory forecasting for autonomous driving. Given the heavy reliance of current trajectory forecasting models on data-driven methodologies, we aim to tackle the challenge of learning general trajectory forecasting representations under limited data availability. We propose a pipeline-level solution to mitigate the issue of data scarcity in trajectory forecasting. The solution is composed of two parts: firstly, we adopt HD map augmentation and trajectory synthesis for generating driving data, and then we learn representations by pre-training on them. Specifically, we apply vector transformations to reshape the maps, and then employ a rule-based model to generate trajectories on both original and augmented scenes; thus enlarging the driving data without collecting additional real ones. To foster the learning of general representations within this augmented dataset, we comprehensively explore the different pre-training strategies, including extending the concept of a Masked AutoEncoder (MAE) for trajectory forecasting. Without bells and whistles, our proposed pipeline-level solution is general, simple, yet effective: we conduct extensive experiments to demonstrate the effectiveness of our data expansion and pre-training strategies, which outperform the baseline prediction model by large margins, e. g. 5. 04%, 3. 84% and 8. 30% in terms of MR 6, minADE 6 and minFDE 6. The pre-training dataset and the codes for pre-training and fine-tuning are released at https://github.com/yhli123/Pretraining_on_Synthetic_Driving_Data_for_Trajectory_Prediction.

Details

AAMAS Conference 2024 Conference Paper

Quantifying Agent Interaction in Multi-agent Reinforcement Learning for Cost-efficient Generalization

Yuxin Chen
Chen Tang
Ran Tian
Chenran Li
Jinning Li
Masayoshi Tomizuka
Wei Zhan

Generalization in Multi-agent Reinforcement Learning (MARL) is challenging. Introducing a diverse set of co-play agents typically boosts the agent’s generalization to unseen co-players. However, the extent to which an agent is influenced by co-players varies across scenarios and environments; thus, the improvement in generalization introduced by diversifying co-players also varies. In this work, we introduce Level of Influence (LoI), a novel metric measuring the interaction intensity among agents within a given scenario and environment. We show that LoI can effectively predict the disparities in the benefits of diversifying co-player distribution across scenarios, offering insights into optimizing training cost for varied situations. The code is available at: https: //github. com/ ThomasChen98/Level-of-Influence.

PDF

RLJ Journal 2024 Journal Article

Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning

Yuxin Chen
Chen Tang
Thomas Tian
Chenran Li
Jinning Li
Masayoshi Tomizuka
Wei Zhan

Generalization poses a significant challenge in Multi-agent Reinforcement Learning (MARL). The extent to which unseen co-players influence an agent depends on the agent's policy and the specific scenario. A quantitative examination of this relationship sheds light on how to effectively train agents for diverse scenarios. In this study, we present the Level of Influence (LoI), a metric quantifying the interaction intensity among agents within a given scenario and environment. We observe that, generally, a more diverse set of co-play agents during training enhances the generalization performance of the ego agent; however, this improvement varies across distinct scenarios and environments. LoI proves effective in predicting these improvement disparities within specific scenarios. Furthermore, we introduce a LoI-guided resource allocation method tailored to train a set of policies for diverse scenarios under a constrained budget. Our results demonstrate that strategic resource allocation based on LoI can achieve higher performance than uniform allocation under the same computation budget. The code is available at: https://github.com/ThomasChen98/Level-of-Influence.

PDF Details

RLC Conference 2024 Conference Paper

Quantifying Interaction Level Between Agents Helps Cost-efficient Generalization in Multi-agent Reinforcement Learning

Yuxin Chen
Chen Tang
Thomas Tian
Chenran Li
Jinning Li
Masayoshi Tomizuka
Wei Zhan

PDF Details

NeurIPS Conference 2023 Conference Paper

Residual Q-Learning: Offline and Online Policy Customization without Value

Chenran Li
Chen Tang
Haruki Nishimura
Jean Mercat
Masayoshi Tomizuka
Wei Zhan

Imitation Learning (IL) is a widely used framework for learning imitative behavior from demonstrations. It is especially appealing for solving complex real-world tasks where handcrafting reward function is difficult, or when the goal is to mimic human expert behavior. However, the learned imitative policy can only follow the behavior in the demonstration. When applying the imitative policy, we may need to customize the policy behavior to meet different requirements coming from diverse downstream tasks. Meanwhile, we still want the customized policy to maintain its imitative nature. To this end, we formulate a new problem setting called policy customization. It defines the learning task as training a policy that inherits the characteristics of the prior policy while satisfying some additional requirements imposed by a target downstream task. We propose a novel and principled approach to interpret and determine the trade-off between the two task objectives. Specifically, we formulate the customization problem as a Markov Decision Process (MDP) with a reward function that combines 1) the inherent reward of the demonstration; and 2) the add-on reward specified by the downstream task. We propose a novel framework, Residual Q-learning, which can solve the formulated MDP by leveraging the prior policy without knowing the inherent reward or value function of the prior policy. We derive a family of residual Q-learning algorithms that can realize offline and online policy customization, and show that the proposed algorithms can effectively accomplish policy customization tasks in various environments. Demo videos and code are available on our website: https: //sites. google. com/view/residualq-learning.

PDF Details