Author name cluster

Ze Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

11 papers

2 author rows

EAAI Journal 2026 Journal Article

A global and local agent-based curriculum reinforcement learning approach for multi-end-effector robotic arm manipulation

Yichen Wang
Shuai Zheng
Ze Yang
Jingmin Guo
Zitong Yang
Jun Hong

Reinforcement learning is widely applied in robotic arm manipulation tasks. However, most of these tasks focus on single and simple end effector. When facing heavy robotic arm hoisting tasks, which are usually manipulated by robotic arms with multi-end-effectors and more degrees of freedom, the single-agent-based reinforcement learning method performs relatively ineffective. In this paper, we propose a multi-agent reinforcement learning approach for hoisting tasks manipulated by robotic arm with multi-end-effectors. The method decomposes the robotic arm into global and local agents based on the degrees of freedom, with one agent controlling global and rough movement, and the other controlling local and fine movement. In this way, the multi-end-effectors’ spatial trajectory can be accurately manipulated. Moreover, in the training process, a four levels curriculum learning strategy is introduced, in which different reward functions are designed respectively, to make the training efficiency and effectiveness. We develop a Unity engine environment-based simulation and perform several comparison experiments. The results demonstrate that the proposed approach outperforms conventional single-agent-based methods.

Details DOI

AAAI Conference 2026 Conference Paper

Adaptive Piecewise Distillation for Efﬁcient LiDAR Data Generation

Ruibo Li
Xiaofeng Yang
Ze Yang
Jiacheng Wei
Chunyan Miao
Guosheng Lin

LiDAR data generation has emerged as a promising solution to the high cost and limited scalability of real-world LiDAR sensing. Recent diffusion and rectified flow models have demonstrated strong capabilities in synthesizing realistic 3D point clouds; however, their iterative sampling procedures result in significant inference overhead. To address this, we focus on efficient few-step LiDAR generation for both unconditional and multi-modal conditional settings. Specifically, we propose an adaptive piecewise distillation strategy tailored for rectified flow-based LiDAR generation models, where the teacher model’s flow trajectory is adaptively segmented into consecutive intervals, and the student is trained only at the start of each interval to directly predict the velocity toward its endpoint. By sequentially sampling at the start timestep of each interval, our method enables fast few-step generation. Moreover, instead of uniform partitioning, we introduce an adaptive timestep selection strategy that chooses interval boundaries with minimal initial error, thereby reducing the complexity of distillation. Experimental results show that our method achieves comparable or superior performance to state-of-the-art methods in both unconditional and multi-modal conditional LiDAR generation, using only four sampling steps.

PDF Details DOI

AAAI Conference 2026 Conference Paper

RiemanLine: Riemannian Manifold Representation of 3D Lines for Factor Graph Optimization

Yan Li
Ze Yang
Keisuke Tateno
Federico Tombari
Liang Zhao
Gim Hee Lee

Minimal parametrization of 3D lines plays a critical role in camera localization and structural mapping. Existing representations in robotics and computer vision predominantly handle independent lines, overlooking structural regularities such as sets of parallel lines that are pervasive in man-made environments. This paper introduces RiemanLine, a unified minimal representation for 3D lines formulated on Riemannian manifolds that jointly accommodates both individual lines and parallel-line groups. Our key idea is to decouple each line landmark into global and local components: a shared vanishing direction optimized on the unit sphere, and scaled normal vectors constrained on orthogonal subspaces, enabling compact encoding of structural regularities. For n parallel lines, the proposed representation reduces the parameter space from 4n (orthonormal form) to 2n+2, naturally embedding parallelism without explicit constraints. We further integrate this parameterization into a factor graph framework, allowing global direction alignment and local reprojection optimization within a unified manifold-based bundle adjustment. Extensive experiments on ICL-NUIM, TartanAir, and synthetic benchmarks demonstrate that our method achieves significantly more accurate pose estimation and line reconstruction, while reducing parameter dimensionality and improving convergence stability.

PDF Details DOI

ICML Conference 2025 Conference Paper

Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting

Zhining Liu 0002
Ze Yang
Xiao Lin 0016
Ruizhong Qiu
Tianxin Wei
Yada Zhu
Hendrik F. Hamann
Jingrui He

Time-series forecasting plays a critical role in many real-world applications. Although increasingly powerful models have been developed and achieved superior results on benchmark datasets, through a fine-grained sample-level inspection, we find that (i) no single model consistently outperforms others across different test samples, but instead (ii) each model excels in specific cases. These findings prompt us to explore how to adaptively leverage the distinct strengths of various forecasting models for different samples. We introduce TimeFuse, a framework for collective time-series forecasting with sample-level adaptive fusion of heterogeneous models. TimeFuse utilizes meta-features to characterize input time series and trains a learnable fusor to predict optimal model fusion weights for any given input. The fusor can leverage samples from diverse datasets for joint training, allowing it to adapt to a wide variety of temporal patterns and thus generalize to new inputs, even from unseen datasets. Extensive experiments demonstrate the effectiveness of TimeFuse in various long-/short-term forecasting tasks, achieving near-universal improvement over the state-of-the-art individual models. Code is available at https: //github. com/ZhiningLiu1998/TimeFuse.

Details

NeurIPS Conference 2025 Conference Paper

CLIMB: Class-imbalanced Learning Benchmark on Tabular Data

Zhining Liu
Zihao Li
Ze Yang
Tianxin Wei
Jian Kang
Yada Zhu
Hendrik Hamann
Jingrui He

Class-imbalanced learning (CIL) on tabular data is important in many real-world applications where the minority class holds the critical but rare outcomes. In this paper, we present CLIMB, a comprehensive benchmark for class-imbalanced learning on tabular data. CLIMB includes 73 real-world datasets across diverse domains and imbalance levels, along with unified implementations of 29 representative CIL algorithms. Built on a high-quality open-source Python package with unified API designs, detailed documentation, and rigorous code quality controls, CLIMB supports easy implementation and comparison between different CIL algorithms. Through extensive experiments, we provide practical insights on method accuracy and efficiency, highlighting the limitations of naive rebalancing, the effectiveness of ensembles, and the importance of data quality. Our code, documentation, and examples are available at https: //github. com/ZhiningLiu1998/imbalanced-ensemble.

PDF Details

NeurIPS Conference 2025 Conference Paper

Flux4D: Flow-based Unsupervised 4D Reconstruction

Jingkang Wang
Henry Che
Yun Chen
Ze Yang
Lily Goli
Sivabalan Manivasagam
Raquel Urtasun

Reconstructing large-scale dynamic scenes from visual observations is a fundamental challenge in computer vision, with critical implications for robotics and autonomous systems. While recent differentiable rendering methods such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have achieved impressive photorealistic reconstruction, they suffer from scalability limitations and require annotations to decouple actor motion. Existing self-supervised methods attempt to eliminate explicit annotations by leveraging motion cues and geometric priors, yet they remain constrained by per-scene optimization and sensitivity to hyperparameter tuning. In this paper, we introduce Flux4D, a simple and scalable framework for 4D reconstruction of large-scale dynamic scenes. Flux4D directly predicts 3D Gaussians and their motion dynamics to reconstruct sensor observations in a fully unsupervised manner. By adopting only photometric losses and enforcing an "as static as possible" regularization, Flux4D learns to decompose dynamic elements directly from raw data without requiring pre-trained supervised models or foundational priors simply by training across many scenes. Our approach enables efficient reconstruction of dynamic scenes within seconds, scales effectively to large datasets, and generalizes well to unseen environments, including rare and unknown objects. Experiments on outdoor driving datasets show Flux4D significantly outperforms existing methods in scalability, generalization, and reconstruction quality.

PDF Details

JBHI Journal 2024 Journal Article

cbPPGGAN: A Generic Enhancement Framework for Unpaired Pulse Waveforms in Camera-Based Photoplethysmography

Ze Yang
Haofei Wang
Bo Liu
Feng Lu

Camera-based photoplethysmography (cbP PG) is a non-contact technique that measures cardiac-related blood volume alterations in skin surface vessels through the analysis of facial videos. While traditional approaches can estimate heart rate (HR) under different illuminations, their accuracy can be affected by motion artifacts, leading to poor waveform fidelity and hindering further analysis of heart rate variability (HRV); deep learning-based approaches reconstruct high-quality pulse waveform, yet their performance significantly degrades under illumination variations. In this work, we aim to leverage the strength of these two methods and propose a framework that possesses favorable generalization capabilities while maintaining waveform fidelity. For this purpose, we propose the cbPPGGAN, an enhancement framework for cbPPG that enables the flexible incorporation of both unpaired and paired data sources in the training process. Based on the waveforms extracted by traditional approaches, the cbPPGGAN reconstructs high-quality waveforms that enable accurate HR estimation and HRV analysis. In addition, to address the lack of paired training data problems in real-world applications, we propose a cycle consistency loss that guarantees the time-frequency consistency before/after mapping. The method enhances the waveform quality of traditional POS approaches in different illumination tests (BH-rPPG) and cross-datasets (UBFC-rPPG) with mean absolute error (MAE) values of 1. 34 bpm and 1. 65 bpm, and average beat-to-beat (AVBB) values of 27. 46 ms and 45. 28 ms, respectively. Experimental results demonstrate that the cbPPGGAN enhances cbPPG signal quality and outperforms the state-of-the-art approaches in HR estimation and HRV analysis. The proposed framework opens a new pathway toward accurate HR estimation in an unconstrained environment.

Details DOI

NeurIPS Conference 2023 Conference Paper

Neural Lighting Simulation for Urban Scenes

Ava Pun
Gary Sun
Jingkang Wang
Yun Chen
Ze Yang
Sivabalan Manivasagam
Wei-Chiu Ma
Raquel Urtasun

Different outdoor illumination conditions drastically alter the appearance of urban scenes, and they can harm the performance of image-based robot perception systems if not seen during training. Camera simulation provides a cost-effective solution to create a large dataset of images captured under different lighting conditions. Towards this goal, we propose LightSim, a neural lighting camera simulation system that enables diverse, realistic, and controllable data generation. LightSim automatically builds lighting-aware digital twins at scale from collected raw sensor data and decomposes the scene into dynamic actors and static background with accurate geometry, appearance, and estimated scene lighting. These digital twins enable actor insertion, modification, removal, and rendering from a new viewpoint, all in a lighting-aware manner. LightSim then combines physically-based and learnable deferred rendering to perform realistic relighting of modified scenes, such as altering the sun location and modifying the shadows or changing the sun brightness, producing spatially- and temporally-consistent camera videos. Our experiments show that LightSim generates more realistic relighting results than prior work. Importantly, training perception models on data generated by LightSim can significantly improve their performance. Our project page is available at https: //waabi. ai/lightsim/.

PDF Details

AAAI Conference 2021 Conference Paper

Learning to Copy Coherent Knowledge for Response Generation

Jiaqi Bai
Ze Yang
Xinnian Liang
Wei Wang
Zhoujun Li

Knowledge-driven dialog has shown remarkable performance to alleviate the problem of generating uninformative responses in the dialog system. However, incorporating knowledge coherently and accurately into response generation is still far from being solved. Previous works dropped into the paradigm of non-goal-oriented knowledge-driven dialog, they are prone to ignore the effect of dialog goal, which has potential impacts on knowledge exploitation and response generation. To address this problem, this paper proposes a Goal-Oriented Knowledge Copy network, GOKC. Specifically, a goal-oriented knowledge discernment mechanism is designed to help the model discern the knowledge facts that are highly correlated to the dialog goal and the dialog context. Besides, a context manager is devised to copy facts not only from the discerned knowledge but also from the dialog goal and the dialog context, which allows the model to accurately restate the facts in the generated response. The empirical studies are conducted on two benchmarks of goal-oriented knowledge-driven dialog generation. The results show that our model can significantly outperform several state-of-theart models in terms of both automatic evaluation and human judgments.

PDF Details

AAAI Conference 2021 Conference Paper

Open Domain Dialogue Generation with Latent Images

Ze Yang
Wei Wu
Huang Hu
Can Xu
Wei Wang
Zhoujun Li

We consider grounding open domain dialogues with images. Existing work assumes that both an image and a textual context are available, but image-grounded dialogues by nature are more difficult to obtain than textual dialogues. Thus, we propose learning a response generation model with both image-grounded dialogues and textual dialogues by assuming that the visual scene information at the time of a conversation can be represented by an image, and trying to recover the latent images of the textual dialogues through text-to-image generation techniques. The likelihood of the two types of dialogues is then formulated by a response generator and an image reconstructor that are learned within a conditional variational auto-encoding framework. Empirical studies are conducted in both image-grounded conversation and text-based conversation. In the first scenario, image-grounded dialogues, especially under a low-resource setting, can be effectively augmented by textual dialogues with latent images; while in the second scenario, latent images can enrich the content of responses and at the same time keep them relevant to contexts.

PDF Details

AAAI Conference 2020 Conference Paper

Context-Transformer: Tackling Object Confusion for Few-Shot Detection

Ze Yang
Yali Wang
Xianyu Chen
Jianzhuang Liu
Yu Qiao

Few-shot object detection is a challenging but realistic scenario, where only a few annotated training images are available for training detectors. A popular approach to handle this problem is transfer learning, i. e. , ﬁne-tuning a detector pretrained on a source-domain benchmark. However, such transferred detector often fails to recognize new objects in the target domain, due to low data diversity of training samples. To tackle this problem, we propose a novel Context-Transformer within a concise deep transfer framework. Speciﬁcally, Context-Transformer can effectively leverage source-domain object knowledge as guidance, and automatically exploit contexts from only a few training images in the target domain. Subsequently, it can adaptively integrate these relational clues to enhance the discriminative power of detector, in order to reduce object confusion in fewshot scenarios. Moreover, Context-Transformer is ﬂexibly embedded in the popular SSD-style detectors, which makes it a plug-and-play module for end-to-end few-shot learning. Finally, we evaluate Context-Transformer on the challenging settings of few-shot detection and incremental few-shot detection. The experimental results show that, our framework outperforms the recent state-of-the-art approaches.

PDF Details