Arrow Research search

Author name cluster

Chenyang Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers
2 author rows

Possible papers

11

AAAI Conference 2026 Conference Paper

VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation

  • Chenyang Wu
  • Jiayi Fu
  • Chun-Le Guo
  • Shuhao Han
  • Chongyi Li

Due to large pixel movement and high computational cost, estimating the motion of high-resolution frames is challenging. Thus, most flow-based Video Frame Interpolation (VFI) methods first predict bidirectional flows at low resolution and then use high-magnification upsampling (e.g., bilinear) to obtain the high-resolution ones. However, this kind of upsampling strategy may cause blur or mosaic at the flows' edges. Additionally, the motion of fine pixels at high resolution cannot be adequately captured in motion estimation at low resolution, which leads to the misalignment of task-oriented flows. With such inaccurate flows, input frames are warped and combined pixel-by-pixel, resulting in ghosting and discontinuities in the interpolated frame. In this study, we propose a novel VFI pipeline, VTinker, which consists of two core components: guided flow upsampling (GFU) and Texture Mapping. After motion estimation at low resolution, GFU introduces input frames as guidance to alleviate the blurring details in bilinear upsampling flows, which makes flows' edges clearer. Subsequently, to avoid pixel-level ghosting and discontinuities, Texture Mapping generates an initial interpolated frame, referred to as the intermediate proxy. The proxy serves as a cue for selecting clear texture blocks from the input frames, which are then mapped onto the proxy to facilitate producing the final interpolated frame via a reconstruction module. Extensive experiments demonstrate that VTinker achieves state-of-the-art performance in VFI.

ICRA Conference 2025 Conference Paper

Large-Scale Gaussian Splatting SLAM

  • Zhe Xin
  • Chenyang Wu
  • Penghui Huang
  • Yanyong Zhang
  • Yinian Mao
  • Guoquan Huang 0003

The recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encour-aging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in largescale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM with stereo cameras, termed LSG-SLAM. The proposed LSG-SLAM employs a multi-modality strategy to estimate prior poses under large view changes. In tracking, we introduce feature-alignment warping constraints to alleviate the adverse effects of appearance similarity in rendering losses. For the scalability of large-scale scenarios, we introduce continuous Gaussian Splatting submaps to tackle unbounded scenes with limited memory. Loops are detected between GS sub maps by place recognition and the relative pose between looped keyframes is optimized utilizing rendering and feature warping losses. After the global optimization of camera poses and Gaussian points, a structure refinement module enhances the reconstruction quality. With extensive evaluations on the EuRoc and KITTI datasets, LSG-SLAM achieves superior performance over existing Neural, 3DGS-based, and even traditional approaches. Project page: https://lsg-slam.github.io.

TMLR Journal 2025 Journal Article

On Representing Convex Quadratically Constrained Quadratic Programs via Graph Neural Networks

  • Chenyang Wu
  • Qian Chen
  • Akang Wang
  • Tian Ding
  • Ruoyu Sun
  • Wenguo Yang
  • Qingjiang Shi

Convex quadratically constrained quadratic programs (QCQPs) involve finding a solution within a convex feasible region defined by quadratic constraints while minimizing a convex quadratic objective function. These problems arise in various industrial applications, including power systems and signal processing. Traditional methods for solving convex QCQPs primarily rely on matrix factorization, which quickly becomes computationally prohibitive as the problem size increases. Recently, graph neural networks (GNNs) have gained attention for their potential in representing and solving various optimization problems such as linear programs and linearly constrained quadratic programs. In this work, we investigate the representation power of GNNs in the context of QCQP tasks. Specifically, we propose a new tripartite graph representation for general convex QCQPs and properly associate it with message-passing GNNs. We demonstrate that there exist GNNs capable of reliably representing key properties of convex QCQPs, including feasibility, optimal value, and optimal solution. Our result deepens the understanding of the connection between QCQPs and GNNs, paving the way for future machine learning approaches to efficiently solve QCQPs.

IJCAI Conference 2025 Conference Paper

Reinforced In-Context Black-Box Optimization

  • Lei Song
  • Chen-Xiao Gao
  • Ke Xue
  • Chenyang Wu
  • Dong Li
  • Jianye Hao
  • Zongzhang Zhang
  • Chao Qian

Black-Box Optimization (BBO) has found successful applications in many fields of science and engineering. Recently, there has been a growing interest in meta-learning particular components of BBO algorithms to speed up optimization and get rid of tedious hand-crafted heuristics. As an extension, learning the entire algorithm from data requires the least labor from experts and can provide the most flexibility. In this paper, we propose RIBBO, a method to reinforce-learn a BBO algorithm from offline data in an end-to-end fashion. RIBBO employs expressive sequence models to learn the optimization histories produced by multiple behavior algorithms and tasks, leveraging the in-context learning ability of large models to extract task information and make decisions accordingly. Central to our method is to augment the optimization histories with regret-to-go tokens, which are designed to represent the performance of an algorithm based on cumulative regret over the future part of the histories. The integration of regret-to-go tokens enables RIBBO to automatically generate sequences of query points that are positively correlated to the user-desired regret, verified by its universally good empirical performance on diverse problems, including BBO benchmark, hyper-parameter optimization, and robot control problems.

AAAI Conference 2024 Conference Paper

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

  • Chen-Xiao Gao
  • Chenyang Wu
  • Mingjun Cao
  • Rui Kong
  • Zongzhang Zhang
  • Yang Yu

Decision Transformer (DT), which employs expressive sequence modeling techniques to perform action generation, has emerged as a promising approach to offline policy optimization. However, DT generates actions conditioned on a desired future return, which is known to bear some weaknesses such as the susceptibility to environmental stochasticity. To overcome DT's weaknesses, we propose to empower DT with dynamic programming. Our method comprises three steps. First, we employ in-sample value iteration to obtain approximated value functions, which involves dynamic programming over the MDP structure. Second, we evaluate action quality in context with estimated advantages. We introduce two types of advantage estimators, IAE and GAE, which are suitable for different tasks. Third, we train an Advantage-Conditioned Transformer (ACT) to generate actions conditioned on the estimated advantages. Finally, during testing, ACT generates actions conditioned on a desired advantage. Our evaluation results validate that, by leveraging the power of dynamic programming, ACT demonstrates effective trajectory stitching and robust action generation in spite of the environmental stochasticity, outperforming baseline methods across various benchmarks. Additionally, we conduct an in-depth analysis of ACT's various design choices through ablation studies. Our code is available at https://github.com/LAMDA-RL/ACT.

IJCAI Conference 2024 Conference Paper

Efficient and Stable Offline-to-online Reinforcement Learning via Continual Policy Revitalization

  • Rui Kong
  • Chenyang Wu
  • Chen-Xiao Gao
  • Zongzhang Zhang
  • Ming Li

In offline Reinforcement Learning (RL), the pre-trained policies are utilized for initialization and subsequent online fine-tuning. However, existing methods suffer from instability and low sample efficiency compared to pure online learning. This paper identifies these limitations stemming from direct policy initialization using offline-trained policy models. We propose Continual Policy Revitalization (CPR) as a novel efficient, stable fine-tuning method. CPR incorporates a periodic policy revitalization technique, restoring the overtrained policy network to full learning capacity while ensuring stable initial performance. This approach enables fine-tuning without being adversely affected by low-quality pre-trained policies. In contrast to previous research, CPR initializes the new policy with an adaptive policy constraint in policy optimization. Such optimization keeps the new policy close to behavior policy constructed from historical policies. This contributes to stable policy improvement and optimal converged performance. Practically, CPR can seamlessly integrate into existing offline RL algorithms with minimal modification. We empirically validate the effectiveness of our method through extensive experiments, demonstrating substantial improvements in learning stability and efficiency compared to previous approaches. Our code is available at https: //github. com/LAMDA-RL/CPR.

AAAI Conference 2024 Short Paper

Generalizable Policy Improvement via Reinforcement Sampling (Student Abstract)

  • Rui Kong
  • Chenyang Wu
  • Zongzhang Zhang

Current policy gradient techniques excel in refining policies over sampled states but falter when generalizing to unseen states. To address this, we introduce Reinforcement Sampling (RS), a novel method leveraging a generalizable action value function to sample improved decisions. RS is able to improve the decision quality whenever the action value estimation is accurate. It works by improving the agent's decision on the fly on the states the agent is visiting. Compared with the historically experienced states in which conventional policy gradient methods improve the policy, the currently visited states are more relevant to the agent. Our method sufficiently exploits the generalizability of the value function on unseen states and sheds new light on the future development of generalizable reinforcement learning.

IROS Conference 2024 Conference Paper

MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

  • Chenyang Wu
  • Yifan Duan
  • Xinran Zhang
  • Yu Sheng
  • Jianmin Ji
  • Yanyong Zhang

Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multimodal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.

NeurIPS Conference 2022 Conference Paper

Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning

  • Chenyang Wu
  • Tianci Li
  • Zongzhang Zhang
  • Yang Yu

Reinforcement learning (RL) is a general framework for modeling sequential decision making problems, at the core of which lies the dilemma of exploitation and exploration. An agent failing to explore systematically will inevitably fail to learn efficiently. Optimism in the face of uncertainty (OFU) is a conventionally successful strategy for efficient exploration. An agent following the OFU principle explores actively and efficiently. However, when applied to model-based RL, it involves specifying a confidence set of the underlying model and solving a series of nonlinear constrained optimization, which can be computationally intractable. This paper proposes an algorithm, Bayesian optimistic optimization (BOO), which adopts a dynamic weighting technique for enforcing the constraint rather than explicitly solving a constrained optimization problem. BOO is a general algorithm proved to be sample-efficient for models in a finite-dimensional reproducing kernel Hilbert space. We also develop techniques for effective optimization and show through some simulation experiments that BOO is competitive with the existing algorithms.

NeurIPS Conference 2021 Conference Paper

Adaptive Online Packing-guided Search for POMDPs

  • Chenyang Wu
  • Guoyu Yang
  • Zongzhang Zhang
  • Yang Yu
  • Dong Li
  • Wulong Liu
  • Jianye Hao

The partially observable Markov decision process (POMDP) provides a general framework for modeling an agent's decision process with state uncertainty, and online planning plays a pivotal role in solving it. A belief is a distribution of states representing state uncertainty. Methods for large-scale POMDP problems rely on the same idea of sampling both states and observations. That is, instead of exact belief updating, a collection of sampled states is used to approximate the belief; instead of considering all possible observations, only a set of sampled observations are considered. Inspired by this, we take one step further and propose an online planning algorithm, Adaptive Online Packing-guided Search (AdaOPS), to better approximate beliefs with adaptive particle filter technique and balance estimation bias and variance by fusing similar observation branches. Theoretically, our algorithm is guaranteed to find an $\epsilon$-optimal policy with a high probability given enough planning time under some mild assumptions. We evaluate our algorithm on several tricky POMDP domains, and it outperforms the state-of-the-art in all of them.

AAAI Conference 2021 Short Paper

LB-DESPOT: Efficient Online POMDP Planning Considering Lower Bound in Action Selection (Student Abstract)

  • Chenyang Wu
  • Rui Kong
  • Guoyu Yang
  • Xianghan Kong
  • Zongzhang Zhang
  • Yang Yu
  • Dong Li
  • Wulong Liu

Partially observable Markov decision process (POMDP) is an extension to MDP. It handles the state uncertainty by specifying the probability of getting a particular observation given the current state. DESPOT is one of the most popular scalable online planning algorithms for POMDPs, which manages to significantly reduce the size of the decision tree while deriving a near-optimal policy by considering only K scenarios. Nevertheless, there is a gap in action selection criteria between planning and execution in DESPOT. During the planning stage, it keeps choosing the action with the highest upper bound, whereas when the planning ends, the action with the highest lower bound is chosen for execution. Here, we propose LB-DESPOT to alleviate this issue, which utilizes the lower bound in selecting an action branch to expand. Empirically, our method has attained better performance than DESPOT and POMCP, which is another state-of-the-art, on several challenging POMDP benchmark tasks.