Arrow Research search

Author name cluster

Zeyuan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2026 Conference Paper

Maniflat3D: Learning 3D Geometry Through Planar Representations from Multi-Layer Unwrapping

  • Zijian Cao
  • Dayou Zhang
  • Zeyuan Liu
  • Zhicheng Liang
  • Fangxin Wang

Point-based geometric representations such as point clouds and Gaussian Splatting are fundamental for 3D understanding. However, the inherent irregularity and high-dimensional nature of point structures present significant challenges for direct 3D learning approaches, which often struggle with scalability and achieve suboptimal performance due to sparse data distributions. In contrast, 2D learning paradigms benefit from well-established architectures with superior optimization stability and efficiency. To bridge this gap, we propose Maniflat3D, a unified framework that systematically transforms volumetric point-based geometries into structured 2D representations through a two-stage process: a multilayer Ball-Pivoting reconstruction with adaptive density control, followed by Scalable Locally Injective Mapping (SLIM) to produce distortion-minimized, bijective UV parameterizations. Our approach explicitly encodes both geometric and attribute information into the flattened domain, enabling conventional 2D neural networks to effectively learn from complex 3D structures such as Gaussian Splatting. Experiments on the ShapeSplat dataset demonstrate that Maniflat3D achieves comparable performance while reducing parameter count by 90% compared to native 3D baselines, and simultaneously attains 21× compression ratio through neural encoding. These results establish a new paradigm for efficient geometric understanding, demonstrating successful transfer of planar learning advantages to challenging 3D manifold problems through dimensional reduction.

NeurIPS Conference 2025 Conference Paper

ADG: Ambient Diffusion-Guided Dataset Recovery for Corruption-Robust Offline Reinforcement Learning

  • Zeyuan Liu
  • Zhihe Yang
  • Jiawei Xu
  • Rui Yang
  • Jiafei Lyu
  • Baoxiang Wang
  • Yunjian Xu
  • Xiu Li

Real-world datasets collected from sensors or human inputs are prone to noise and errors, posing significant challenges for applying offline reinforcement learning (RL). While existing methods have made progress in addressing corrupted actions and rewards, they remain insufficient for handling corruption in high-dimensional state spaces and for cases where multiple elements in the dataset are corrupted simultaneously. Diffusion models, known for their strong denoising capabilities, offer a promising direction for this problem—but their tendency to overfit noisy samples limits their direct applicability. To overcome this, we propose A mbient D iffusion- G uided Dataset Recovery ( ADG ), a novel approach that pioneers the use of diffusion models to tackle data corruption in offline RL. First, we introduce Ambient Denoising Diffusion Probabilistic Models (DDPM) from approximated distributions, which enable learning on partially corrupted datasets with theoretical guarantees. Second, we use the noise-prediction property of Ambient DDPM to distinguish between clean and corrupted data, and then use the clean subset to train a standard DDPM. Third, we employ the trained standard DDPM to refine the previously identified corrupted data, enhancing data quality for subsequent offline RL training. A notable strength of ADG is its versatility—it can be seamlessly integrated with any offline RL algorithm. Experiments on a range of benchmarks, including MuJoCo, Kitchen, and Adroit, demonstrate that ADG effectively mitigates the impact of corrupted data and improves the robustness of offline RL under various noise settings, achieving state-of-the-art results.

ICLR Conference 2025 Conference Paper

Advancing LLM Reasoning Generalists with Preference Trees

  • Lifan Yuan
  • Ganqu Cui
  • Hanbin Wang
  • Ning Ding 0002
  • Xingyao Wang 0002
  • Boji Shan
  • Zeyuan Liu
  • Jia Deng

We introduce EURUS, a suite of large language models (LLMs) optimized for reasoning. Finetuned from Mistral-7B, Llama-3-8B, and Mixtral-8x22B, EURUS models achieve state-of-the-art results among open-source models on a diverse set of benchmarks covering mathematics, code generation, and logical reasoning problems. Notably, EURUX-8X22B outperforms GPT-3.5 Turbo in reasoning through a comprehensive benchmarking across 12 test sets covering five tasks. The strong performance of EURUS can be primarily attributed to ULTRAINTERACT, our newly-curated large-scale, high-quality training data dataset specifically designed for complex reasoning tasks. ULTRAINTERACT can be used in both supervised fine-tuning, preference learning, and reward modeling. It pairs each instruction with a preference tree consisting of (1) reasoning chains with diverse planning strategies in a unified format, (2) multi-turn interaction trajectories with the environment and the critique, and (3) pairwise positive and negative responses to facilitate preference learning. ULTRAINTERACT allows us to conduct an in-depth exploration of preference learning for reasoning tasks. Our investigation reveals that some well-established preference learning algorithms may be less suitable for reasoning tasks compared to their effectiveness in general conversations. The hypothesis is that in reasoning tasks, the space of correct answers is much smaller than that of incorrect ones, so it is necessary to explicitly increase the reward of chosen data. Therefore, in addition to increasing the reward margin as many preference learning algorithms do, the absolute values of positive responses’ rewards should be positive and may serve as a proxy for performance. Inspired by this, we derive a novel reward modeling objective and empirically that it leads to a stable reward modeling curve and better performance. Together with ULTRAINTERACT, we obtain a strong reward model.

AAMAS Conference 2025 Conference Paper

CDSA: Conservative Denoising Score-based Algorithm for Offline Reinforcement Learning

  • Zeyuan Liu
  • Kai Yang
  • Jiafei Lyu
  • Xiu Li

Distribution shift is a major obstacle in offline reinforcement learning (RL). While existing conservative offline RL algorithms perform well in learning in-distribution policies, they often fail to generalize to unseen actions. To address this issue, we propose leveraging knowledge derived from the gradient fields of the dataset’s density to refine and adjust the original actions. Building on this, we introduce the Conservative Denoising Score-based Algorithm (CDSA), which utilizes score-based diffusion models to estimate the gradients of the dataset density and generates action correction subcomponents to refine the actions. This approach enables more accurate and efficient decision-making during the testing phase in Markov Decision Process (MDP) environments. By decoupling conservatism constraints from the policy, our method is broadly applicable to various offline RL algorithms. Experiments demonstrate that our approach significantly enhances baseline performance on D4RL datasets and exhibits plug-and-play compatibility with different pre-trained offline RL policies.

NeurIPS Conference 2025 Conference Paper

FairNet: Dynamic Fairness Correction without Performance Loss via Contrastive Conditional LoRA

  • Songqi Zhou
  • Zeyuan Liu
  • Benben Jiang

Ensuring fairness in machine learning models is a critical challenge. Existing debiasing methods often compromise performance, rely on static correction strategies, and struggle with data sparsity, particularly within minority groups. Furthermore, their utilization of sensitive attributes is often suboptimal, either depending excessively on complete attribute labeling or disregarding these attributes entirely. To overcome these limitations, we propose FairNet, a novel framework for dynamic, instance-level fairness correction. FairNet integrates a bias detector with conditional low-rank adaptation (LoRA), which enables selective activation of the fairness correction mechanism exclusively for instances identified as biased, and thereby preserve performance on unbiased instances. A key contribution is a new contrastive loss function for training the LoRA module, specifically designed to minimize intra-class representation disparities across different sensitive groups and effectively address underfitting in minority groups. The FairNet framework can flexibly handle scenarios with complete, partial, or entirely absent sensitive attribute labels. Theoretical analysis confirms that, under moderate TPR/FPR for the bias detector, FairNet can enhance the performance of the worst group without diminishing overall model performance, and potentially yield slight performance improvements. Comprehensive empirical evaluations across diverse vision and language benchmarks validate the effectiveness of FairNet. Code is available at \url{https: //github. com/SongqiZhou/FairNet}.

AAMAS Conference 2025 Conference Paper

Leveraging Score-based Models for Generating Penalization in Model-based Offline Reinforcement Learning

  • Zeyuan Liu
  • Zhirui Fang
  • Jiafei Lyu
  • Xiu Li

A core challenge in model-based offline reinforcement learning is constructing penalties over the state-action space of the offline dataset, which is typically high-dimensional. We define “cliffs” as regions in the state-action space where data density changes sharply, and our investigation shows that existing approaches struggle with accuracy near these cliffs. The formation of cliffs could be influenced by human-defined parameters and objective physical laws, often beyond the understanding of RL agents. This results in a lack of established methods to address this issue. To overcome these limitations, we propose Score as a Penalty for Model-based Offline Reinforcement Learning (ScorePen-MORL). This innovative approach generates penalties based on the gradient filed of dataset density in the state-action space. ScorePen-MORL is a plug-and-play solution that can achieve impressive results independently while also enhancing the performance of baseline algorithms through the joint effect. Our empirical findings demonstrate that cliff regions in the dataset are a significant bottleneck in offline model-based RL, and ScorePen-MORL effectively addresses this issue by generating highly sensitive penalties for these cliff regions. Through the empirical results on the D4RL and NeoRL benchmarks, we find our method outperforms recent strong model-based offline RL baseline algorithms.

NeurIPS Conference 2024 Conference Paper

Multi-Agent Coordination via Multi-Level Communication

  • Ziluo Ding
  • Zeyuan Liu
  • Zhirui Fang
  • Kefan Su
  • Liwen Zhu
  • Zongqing Lu

The partial observability and stochasticity in multi-agent settings can be mitigated by accessing more information about others via communication. However, the coordination problem still exists since agents cannot communicate actual actions with each other at the same time due to the circular dependencies. In this paper, we propose a novel multi-level communication scheme, Sequential Communication (SeqComm). SeqComm treats agents asynchronously (the upper-level agents make decisions before the lower-level ones) and has two communication phases. In the negotiation phase, agents determine the priority of decision-making by communicating hidden states of observations and comparing the value of intention, which is obtained by modeling the environment dynamics. In the launching phase, the upper-level agents take the lead in making decisions and then communicate their actions with the lower-level agents. Theoretically, we prove the policies learned by SeqComm are guaranteed to improve monotonically and converge. Empirically, we show that SeqComm outperforms existing methods in a variety of cooperative multi-agent tasks.