Author name cluster

Chenyang Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

AAAI Conference 2026 Conference Paper

VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation

Chenyang Wu
Jiayi Fu
Chun-Le Guo
Shuhao Han
Chongyi Li

Due to large pixel movement and high computational cost, estimating the motion of high-resolution frames is challenging. Thus, most flow-based Video Frame Interpolation (VFI) methods first predict bidirectional flows at low resolution and then use high-magnification upsampling (e.g., bilinear) to obtain the high-resolution ones. However, this kind of upsampling strategy may cause blur or mosaic at the flows' edges. Additionally, the motion of fine pixels at high resolution cannot be adequately captured in motion estimation at low resolution, which leads to the misalignment of task-oriented flows. With such inaccurate flows, input frames are warped and combined pixel-by-pixel, resulting in ghosting and discontinuities in the interpolated frame. In this study, we propose a novel VFI pipeline, VTinker, which consists of two core components: guided flow upsampling (GFU) and Texture Mapping. After motion estimation at low resolution, GFU introduces input frames as guidance to alleviate the blurring details in bilinear upsampling flows, which makes flows' edges clearer. Subsequently, to avoid pixel-level ghosting and discontinuities, Texture Mapping generates an initial interpolated frame, referred to as the intermediate proxy. The proxy serves as a cue for selecting clear texture blocks from the input frames, which are then mapped onto the proxy to facilitate producing the final interpolated frame via a reconstruction module. Extensive experiments demonstrate that VTinker achieves state-of-the-art performance in VFI.

PDF Details DOI

ICRA Conference 2025 Conference Paper

Large-Scale Gaussian Splatting SLAM

Zhe Xin
Chenyang Wu
Penghui Huang
Yanyong Zhang
Yinian Mao
Guoquan Huang 0003

The recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encour-aging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in largescale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM. The proposed LSG-SLAM employs a multi-modality strategy to estimate prior poses under large view changes. In tracking, we introduce feature-alignment warping constraints to alleviate the adverse effects of appearance similarity in rendering losses. For the scalability of large-scale scenarios, we introduce continuous Gaussian Splatting submaps to tackle unbounded scenes with limited memory. Loops are detected between GS sub maps by place recognition and the relative pose between looped keyframes is optimized utilizing rendering and feature warping losses. After the global optimization of camera poses and Gaussian points, a structure refinement module enhances the reconstruction quality. With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches. Project page: https://lsg-slam.github.io.

Details

TMLR Journal 2025 Journal Article

On Representing Convex Quadratically Constrained Quadratic Programs via Graph Neural Networks

Chenyang Wu
Qian Chen
Akang Wang
Tian Ding
Ruoyu Sun
Wenguo Yang
Qingjiang Shi

Convex quadratically constrained quadratic programs (QCQPs) involve finding a solution within a convex feasible region defined by quadratic constraints while minimizing a convex quadratic objective function. These problems arise in various industrial applications, including power systems and signal processing. Traditional methods for solving convex QCQPs primarily rely on matrix factorization, which quickly becomes computationally prohibitive as the problem size increases. Recently, graph neural networks (GNNs) have gained attention for their potential in representing and solving various optimization problems such as linear programs and linearly constrained quadratic programs. In this work, we investigate the representation power of GNNs in the context of QCQP tasks. Specifically, we propose a new tripartite graph representation for general convex QCQPs and properly associate it with message-passing GNNs. We demonstrate that there exist GNNs capable of reliably representing key properties of convex QCQPs, including feasibility, optimal value, and optimal solution. Our result deepens the understanding of the connection between QCQPs and GNNs, paving the way for future machine learning approaches to efficiently solve QCQPs.

PDF Details

IJCAI Conference 2025 Conference Paper

Reinforced In-Context Black-Box Optimization

Lei Song
Chen-Xiao Gao
Ke Xue
Chenyang Wu
Dong Li
Jianye Hao
Zongzhang Zhang
Chao Qian

Black-Box Optimization (BBO) has found successful applications in many fields of science and engineering. Recently, there has been a growing interest in meta-learning particular components of BBO algorithms to speed up optimization and get rid of tedious hand-crafted heuristics. As an extension, learning the entire algorithm from data requires the least labor from experts and can provide the most flexibility. In this paper, we propose RIBBO, a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion. RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks, leveraging the in-context learning ability of large models to extract task information and make decisions accordingly. Central to our method is to augment the optimization histories with regret-to-go tokens, which are designed to represent the performance of an algorithm based on cumulative regret over the future part of the histories. The integration of regret-to-go tokens enables RIBBO to automatically generate sequences of query points that are positively correlated to the user-desired regret, verified by its universally good empirical performance on diverse problems, including BBO benchmark, hyper-parameter optimization, and robot control problems.

PDF Details DOI

AAAI Conference 2024 Conference Paper

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Chen-Xiao Gao
Chenyang Wu
Mingjun Cao
Rui Kong
Zongzhang Zhang
Yang Yu

Decision Transformer (DT), which employs expressive sequence modeling techniques to perform action generation, has emerged as a promising approach to offline policy optimization. However, DT generates actions conditioned on a desired future return, which is known to bear some weaknesses such as the susceptibility to environmental stochasticity. To overcome DT's weaknesses, we propose to empower DT with dynamic programming. Our method comprises three steps. First, we employ in-sample value iteration to obtain approximated value functions, which involves dynamic programming over the MDP structure. Second, we evaluate action quality in context with estimated advantages. We introduce two types of advantage estimators, IAE and GAE, which are suitable for different tasks. Third, we train an Advantage-Conditioned Transformer (ACT) to generate actions conditioned on the estimated advantages. Finally, during testing, ACT generates actions conditioned on a desired advantage. Our evaluation results validate that, by leveraging the power of dynamic programming, ACT demonstrates effective trajectory stitching and robust action generation in spite of the environmental stochasticity, outperforming baseline methods across various benchmarks. Additionally, we conduct an in-depth analysis of ACT's various design choices through ablation studies. Our code is available at https://github.com/LAMDA-RL/ACT.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization

Rui Kong
Chenyang Wu
Chen-Xiao Gao
Zongzhang Zhang
Ming Li

In offline Reinforcement Learning (RL), the pre-trained policies are utilized for initialization and subsequent online fine-tuning. However, existing methods suffer from instability and low sample efficiency compared to pure online learning. This paper identifies these limitations stemming from direct policy initialization using offline-trained policy models. We propose Continual Policy Revitalization (CPR) as a novel efficient, stable fine-tuning method. CPR incorporates a periodic policy revitalization technique, restoring the overtrained policy network to full learning capacity while ensuring stable initial performance. This approach enables fine-tuning without being adversely affected by low-quality pre-trained policies. In contrast to previous research, CPR initializes the new policy with an adaptive policy constraint in policy optimization. Such optimization keeps the new policy close to behavior policy constructed from historical policies. This contributes to stable policy improvement and optimal converged performance. Practically, CPR can seamlessly integrate into existing offline RL algorithms with minimal modification. We empirically validate the effectiveness of our method through extensive experiments, demonstrating substantial improvements in learning stability and efficiency compared to previous approaches. Our code is available at https: //github. com/LAMDA-RL/CPR.

PDF Details DOI

AAAI Conference 2024 Short Paper

Generalizable Policy Improvement via Reinforcement Sampling (Student Abstract)

Rui Kong
Chenyang Wu
Zongzhang Zhang

Current policy gradient techniques excel in refining policies over sampled states but falter when generalizing to unseen states. To address this, we introduce Reinforcement Sampling (RS), a novel method leveraging a generalizable action value function to sample improved decisions. RS is able to improve the decision quality whenever the action value estimation is accurate. It works by improving the agent's decision on the fly on the states the agent is visiting. Compared with the historically experienced states in which conventional policy gradient methods improve the policy, the currently visited states are more relevant to the agent. Our method sufficiently exploits the generalizability of the value function on unseen states and sheds new light on the future development of generalizable reinforcement learning.

PDF Details DOI

IROS Conference 2024 Conference Paper

MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

Chenyang Wu
Yifan Duan
Xinran Zhang
Yu Sheng
Jianmin Ji
Yanyong Zhang

Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multimodal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.

Details

NeurIPS Conference 2022 Conference Paper

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

Chenyang Wu
Tianci Li
Zongzhang Zhang
Yang Yu

Reinforcement learning (RL) is a general framework for modeling sequential decision making problems, at the core of which lies the dilemma of exploitation and exploration. An agent failing to explore systematically will inevitably fail to learn efficiently. Optimism in the face of uncertainty (OFU) is a conventionally successful strategy for efficient exploration. An agent following the OFU principle explores actively and efficiently. However, when applied to model-based RL, it involves specifying a confidence set of the underlying model and solving a series of nonlinear constrained optimization, which can be computationally intractable. This paper proposes an algorithm, Bayesian optimistic optimization (BOO), which adopts a dynamic weighting technique for enforcing the constraint rather than explicitly solving a constrained optimization problem. BOO is a general algorithm proved to be sample-efficient for models in a finite-dimensional reproducing kernel Hilbert space. We also develop techniques for effective optimization and show through some simulation experiments that BOO is competitive with the existing algorithms.

PDF Details

NeurIPS Conference 2021 Conference Paper

Adaptive Online Packing-guided Search for POMDPs

Chenyang Wu
Guoyu Yang
Zongzhang Zhang
Yang Yu
Dong Li
Wulong Liu
Jianye Hao

The partially observable Markov decision process (POMDP) provides a general framework for modeling an agent's decision process with state uncertainty, and online planning plays a pivotal role in solving it. A belief is a distribution of states representing state uncertainty. Methods for large-scale POMDP problems rely on the same idea of sampling both states and observations. That is, instead of exact belief updating, a collection of sampled states is used to approximate the belief; instead of considering all possible observations, only a set of sampled observations are considered. Inspired by this, we take one step further and propose an online planning algorithm, Adaptive Online Packing-guided Search (AdaOPS), to better approximate beliefs with adaptive particle filter technique and balance estimation bias and variance by fusing similar observation branches. Theoretically, our algorithm is guaranteed to find an $\epsilon$-optimal policy with a high probability given enough planning time under some mild assumptions. We evaluate our algorithm on several tricky POMDP domains, and it outperforms the state-of-the-art in all of them.

PDF Details

AAAI Conference 2021 Short Paper

LB-DESPOT: Efficient Online POMDP Planning Considering Lower Bound in Action Selection (Student Abstract)

Chenyang Wu
Rui Kong
Guoyu Yang
Xianghan Kong
Zongzhang Zhang
Yang Yu
Dong Li
Wulong Liu

Partially observable Markov decision process (POMDP) is an extension to MDP. It handles the state uncertainty by specifying the probability of getting a particular observation given the current state. DESPOT is one of the most popular scalable online planning algorithms for POMDPs, which manages to significantly reduce the size of the decision tree while deriving a near-optimal policy by considering only K scenarios. Nevertheless, there is a gap in action selection criteria between planning and execution in DESPOT. During the planning stage, it keeps choosing the action with the highest upper bound, whereas when the planning ends, the action with the highest lower bound is chosen for execution. Here, we propose LB-DESPOT to alleviate this issue, which utilizes the lower bound in selecting an action branch to expand. Empirically, our method has attained better performance than DESPOT and POMCP, which is another state-of-the-art, on several challenging POMDP benchmark tasks.

PDF Details