Author name cluster

Siming Lan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

AAAI Conference 2026 Conference Paper

Efficient Diffusion Planning with Temporal Diffusion

Jiaming Guo
Rui Zhang
Zerun Li
Yunkai Gao
Shaohui Peng
Siming Lan
Xing Hu
Zidong Du

Diffusion planning is a promising method for learning high-performance policies from offline data. To avoid the impact of discrepancies between planning and reality on performance, previous works generate new plans at each time step. However, this incurs significant computational overhead and leads to lower decision frequencies, and frequent plan switching may also affect performance. In contrast, humans might create detailed short-term plans and more general, sometimes vague, long-term plans, and adjust them over time. Inspired by this, we propose the Temporal Diffusion Planner (TDP) which improves decision efficiency by distributing the denoising steps across the time dimension. TDP begins by generating an initial plan that becomes progressively more vague over time. At each subsequent time step, rather than generating an entirely new plan, TDP updates the previous one with a small number of denoising steps. This reduces the average number of denoising steps, improving decision efficiency. Additionally, we introduce an automated replanning mechanism to prevent significant deviations between the plan and reality. Experiments on D4RL show that, compared to previous works that generate new plans every time step, TDP significantly improves the decision-making frequency by 11-24.8 times while achieving higher or comparable performance.

PDF Details DOI

AAAI Conference 2024 Conference Paper

OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning

Fan Wu
Rui Zhang
Qi Yi
Yunkai Gao
Jiaming Guo
Shaohui Peng
Siming Lan
Husheng Han

Model-based offline reinforcement learning (RL) algorithms have emerged as a promising paradigm for offline RL. These algorithms usually learn a dynamics model from a static dataset of transitions, use the model to generate synthetic trajectories, and perform conservative policy optimization within these trajectories. However, our observations indicate that policy optimization methods used in these model-based offline RL algorithms are not effective at exploring the learned model and induce biased exploration, which ultimately impairs the performance of the algorithm. To address this issue, we propose Offline Conservative ExplorAtioN (OCEAN), a novel rollout approach to model-based offline RL. In our method, we incorporate additional exploration techniques and introduce three conservative constraints based on uncertainty estimation to mitigate the potential impact of significant dynamic errors resulting from exploratory transitions. Our work is a plug-in method and can be combined with classical model-based RL algorithms, such as MOPO, COMBO, and RAMBO. Experiment results of our method on the D4RL MuJoCo benchmark show that OCEAN significantly improves the performance of existing algorithms.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Context Shift Reduction for Offline Meta-Reinforcement Learning

Yunkai Gao
Rui Zhang
Jiaming Guo
Fan Wu
Qi Yi
Shaohui Peng
Siming Lan
Ruizhi Chen

Offline meta-reinforcement learning (OMRL) utilizes pre-collected offline datasets to enhance the agent's generalization ability on unseen tasks. However, the context shift problem arises due to the distribution discrepancy between the contexts used for training (from the behavior policy) and testing (from the exploration policy). The context shift problem leads to incorrect task inference and further deteriorates the generalization ability of the meta-policy. Existing OMRL methods either overlook this problem or attempt to mitigate it with additional information. In this paper, we propose a novel approach called Context Shift Reduction for OMRL (CSRO) to address the context shift problem with only offline datasets. The key insight of CSRO is to minimize the influence of policy in context during both the meta-training and meta-test phases. During meta-training, we design a max-min mutual information representation learning mechanism to diminish the impact of the behavior policy on task representation. In the meta-test phase, we introduce the non-prior context collection strategy to reduce the effect of the exploration policy. Experimental results demonstrate that CSRO significantly reduces the context shift and improves the generalization ability, surpassing previous methods across various challenging domains.

PDF Details

NeurIPS Conference 2023 Conference Paper

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

Siming Lan
Rui Zhang
Qi Yi
Jiaming Guo
Shaohui Peng
Yunkai Gao
Fan Wu
Ruizhi Chen

In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared modules at the task level, ignoring that there may be conflicts within the task. In addition, these methods do not take into account that without constraints, some modules may learn similar functions, resulting in restricting the model's expressiveness and generalization capability of modular methods. In this paper, we propose the Contrastive Modules with Temporal Attention(CMTA) method to address these limitations. CMTA constrains the modules to be different from each other by contrastive learning and combining shared modules at a finer granularity than the task level with temporal attention, alleviating the negative transfer within the task and improving the generalization ability and the performance for multi-task RL. We conducted the experiment on Meta-World, a multi-task RL benchmark containing various robotics manipulation tasks. Experimental results show that CMTA outperforms learning each task individually for the first time and achieves substantial performance improvements over the baselines.

PDF Details

ICML Conference 2023 Conference Paper

Online Prototype Alignment for Few-shot Policy Transfer

Qi Yi
Rui Zhang 0040
Shaohui Peng
Jiaming Guo
Yunkai Gao 0001
Kaizhao Yuan
Ruizhi Chen
Siming Lan

Domain adaptation in RL mainly deals with the changes of observation when transferring the policy to a new environment. Many traditional approaches of domain adaptation in RL manage to learn a mapping function between the source and target domain in explicit or implicit ways. However, they typically require access to abundant data from the target domain. Besides, they often rely on visual clues to learn the mapping function and may fail when the source domain looks quite different from the target domain. To address these problems, in this paper, we propose a novel framework Online Prototype Alignment (OPA) to learn the mapping function based on the functional similarity of elements and is able to achieve few-shot policy transfer within only several episodes. The key insight of OPA is to introduce an exploration mechanism that can interact with the unseen elements of the target domain in an efficient and purposeful manner, and then connect them with the seen elements in the source domain according to their functionalities (instead of visual clues). Experimental results show that when the target domain looks visually different from the source domain, OPA can achieve better transfer performance even with much fewer samples from the target domain, outperforming prior methods.

Details