Arrow Research search

Author name cluster

Zixuan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

TMLR Journal 2026 Journal Article

Offline Model-Based Optimization: Comprehensive Review

  • Minsu Kim
  • Jiayao Gu
  • Ye Yuan
  • Taeyoung Yun
  • Zixuan Liu
  • Yoshua Bengio
  • Can Chen

Offline black-box optimization is a fundamental challenge in science and engineering, where the goal is to optimize black-box functions using only offline datasets. This setting is particularly relevant when querying the objective function is prohibitively expensive or infeasible, with applications spanning protein engineering, material discovery, neural architecture search, and beyond. The main difficulty lies in accurately estimating the objective landscape beyond the available data, where extrapolations are fraught with significant epistemic uncertainty. This uncertainty can lead to objective hacking (reward hacking)—exploiting model inaccuracies in unseen regions—or other spurious optimizations that yield misleadingly high performance estimates outside the offline distribution. Recent advances in model-based optimization (MBO) have harnessed the generalization capabilities of deep neural networks to develop offline-specific surrogate and generative models. Trained with carefully designed strategies, these models are more robust against out-of-distribution issues, facilitating the discovery of improved designs. Despite its growing impact in accelerating scientific discovery, the field lacks a comprehensive review. To bridge this gap, we present the first thorough review of offline MBO. We begin by formalizing the problem for both single-objective and multi-objective settings and by reviewing recent benchmarks and evaluation metrics. We then categorize existing approaches into two key areas: surrogate modeling, which emphasizes accurate function approximation in out-of-distribution regions, and generative modeling, which explores high-dimensional design spaces to identify high-performing designs. Finally, we examine the key challenges and propose promising directions for advancement in this rapidly evolving field including safe control of superintelligent systems.

AAAI Conference 2026 Conference Paper

Targeting Misalignment: A Conflict-Aware Framework for Reward-Model-based LLM Alignment

  • Zixuan Liu
  • Siavash H. Khajavi
  • Guangkai Jiang
  • Xinru Liu

Reward-model-based fine-tuning is a central paradigm in aligning Large Language Models with human preferences. However, such approaches critically rely on the assumption that proxy reward models accurately reflect intended supervision, a condition often violated due to annotation noise, bias, or limited coverage. This misalignment can lead to undesirable behaviors, where models optimize for flawed signals rather than true human values. In this paper, we investigate a novel framework to identify and mitigate such misalignment by treating the fine-tuning process as a form of knowledge integration. We focus on detecting instances of proxy-policy conflicts, cases where the base model strongly disagrees with the proxy. We argue that such conflicts often signify areas of shared ignorance, where neither the policy nor the reward model possesses sufficient knowledge, making them especially susceptible to misalignment. To this end, we propose two complementary metrics for identifying these conflicts: a localized Proxy-Policy Alignment Conflict Score (PACS) and a global Kendall-Tau Distance measure. Building on this insight, we design an algorithm named Selective Human-in-the-loop Feedback via Conflict-Aware Sampling (SHF-CAS) that targets high-conflict QA pairs for additional feedback, refining both the reward model and policy efficiently. Experiments on two alignment tasks demonstrate that our approach enhances general alignment performance, even when trained with a biased proxy reward. Our work provides a new lens for interpreting alignment failures and offers a principled pathway for targeted refinement in LLM training.

YNIMG Journal 2025 Journal Article

A Test-Retest Study of Single- and Multi-Delay pCASL for Choroid Plexus Perfusion Imaging in Healthy Subjects Aged 19 to 87 Years

  • Zixuan Liu
  • Qinyang Shou
  • Kay Jann
  • Chenyang Zhao
  • Danny JJ Wang
  • Xingfeng Shao

There is a growing interest in the choroid plexus (ChP) due to its critical role in cerebrospinal fluid (CSF) production and its involvement in neurodegenerative and cerebrovascular diseases. However, comprehensive studies comparing the accuracy and reliability of single- and multi-PLD (post-labeling delay) arterial spin labeling (ASL) techniques, specifically in relation to the ChP, remain limited. This study systematically evaluated the test-retest reliability and quantification accuracy of cerebral blood flow (CBF) measurements, focusing on the ChP, using single-delay and multi-delay 3D gradient-and-spin echo (GRASE) pseudo-continuous ASL (pCASL) on 28 subjects (aged 19 to 87 years, 14 males/14 females) at 3.0 tesla. Both single-delay (2 s) and 5-PLD (0.5 - 2.5 s) pCASL scans were repeated approximately one week apart with a spatial resolution of 2.5 × 2.5 × 3 mm³. Voxel-wise and regional CBF and arterial transit time (ATT) measurements were compared to assess test-retest reliability, with a particular focus on ChP perfusion changes with age. In this study, 12.15 % of ChP voxels exhibited ATTs longer than 2 s, potentially leading to a significant underestimation of CBF using single-delay ASL. Multi-delay ASL showed improved accuracy in estimating CBF values for the ChP compared to single-delay ASL when ATT > PLD. Additionally, ChP volume (mean ± std = 1.72± 0.85 ml) increased (p < 0.01) and ChP perfusion (43.07±14.18 mL/100 g/min) decreased (p = 0.04) with age. These findings underscore the robustness of multi-delay ASL with model-fitting quantification in assessing ChP perfusion, making it the preferred method for accurate CBF and ATT estimation, particularly in regions with prolonged transit time such as ChP.

NeurIPS Conference 2025 Conference Paper

DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding

  • Zixuan Liu
  • Siavash H. Khajavi
  • Guangkai Jiang

Recent advances in multi-modal models have demonstrated strong performance in tasks such as image generation and reasoning. However, applying these models to the fire domain remains challenging due to the lack of publicly available datasets with high-quality fire domain annotations. To address this gap, we introduce $\textbf{DetectiumFire}$, a large-scale, multi-modal dataset comprising of 22. 5k high-resolution fire-related images and 2. 5k real-world fire-related videos covering a wide range of fire types, environments, and risk levels. The data are annotated with both traditional computer vision labels (e. g. , bounding boxes) and detailed textual prompts describing the scene, enabling applications such as synthetic data generation and fire risk reasoning. DetectiumFire offers clear advantages over existing benchmarks in scale, diversity, and data quality, significantly reducing redundancy and enhancing coverage of real-world scenarios. We validate the utility of DetectiumFire across multiple tasks, including object detection, diffusion-based image generation, and vision-language reasoning. Our results highlight the potential of this dataset to advance fire-related research and support the development of intelligent safety systems. We release DetectiumFire to promote broader exploration of fire understanding in the AI community.

NeurIPS Conference 2025 Conference Paper

VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models

  • Chongkai Gao
  • Zixuan Liu
  • Zhenghao Chi
  • Junshan Huang
  • Xin Fei
  • Yiwen Hou
  • Yuxuan Zhang
  • Yudi Lin

Recent studies on Vision-Language-Action (VLA) models have shifted from the end-to-end action-generation paradigm toward a pipeline involving task planning followed by action generation, demonstrating improved performance on various complex, long-horizon manipulation tasks. However, existing approaches vary significantly in terms of network architectures, planning paradigms, representations, and training data sources, making it challenging for researchers to identify the precise sources of performance gains and determine which component is more difficult to learn. To systematically investigate the impacts of different planning paradigms and representations isolating from network architectures and training data, in this paper, we introduce \name, a unified VLA architecture suite capable of various task planning paradigms, and design a comprehensive suite of controlled experiments across diverse object categories (rigid and deformable), visual modalities (2D and 3D), environments (simulation and real-world), and end-effectors (grippers and dexterous hands). Our results demonstrate that: 1) visually grounded planning representations are generally better than language planning representations; 2) the Hierarchical-VLA paradigm generally achieves superior performance than other paradigms, albeit at the cost of slower training and inference speeds.

TMLR Journal 2024 Journal Article

Robust Guided Diffusion for Offline Black-Box Optimization

  • Can Chen
  • Christopher Beckham
  • Zixuan Liu
  • Xue Liu
  • Christopher Pal

Offline black-box optimization aims to maximize a black-box function using an offline dataset of designs and their measured properties. Two main approaches have emerged: the forward approach, which learns a mapping from input to its value, thereby acting as a proxy to guide optimization, and the inverse approach, which learns a mapping from value to input for conditional generation. (a) Although proxy-free~(classifier-free) diffusion shows promise in robustly modeling the inverse mapping, it lacks explicit guidance from proxies, essential for generating high-performance samples beyond the training distribution. Therefore, we propose \textit{proxy-enhanced sampling} which utilizes the explicit guidance from a trained proxy to bolster proxy-free diffusion with enhanced sampling control. (b) Yet, the trained proxy is susceptible to out-of-distribution issues. To address this, we devise the module \textit{diffusion-based proxy refinement}, which seamlessly integrates insights from proxy-free diffusion back into the proxy for refinement. To sum up, we propose \textit{\textbf{R}obust \textbf{G}uided \textbf{D}iffusion for Offline Black-box Optimization}~(\textbf{RGD}), combining the advantages of proxy~(explicit guidance) and proxy-free diffusion~(robustness) for effective conditional generation. RGD achieves state-of-the-art results on various design-bench tasks, underscoring its efficacy. Our code is \href{https://github.com/GGchen1997/RGD}{here}.

TMLR Journal 2023 Journal Article

Dynamics Adapted Imitation Learning

  • Zixuan Liu
  • Liu Liu
  • Bingzhe Wu
  • Lanqing Li
  • Xueqian Wang
  • Bo Yuan
  • Peilin Zhao

We consider Imitation Learning with dynamics variation between the expert demonstration (source domain) and the environment (target domain). Based on the popular framework of Adversarial Imitation Learning, we propose a novel algorithm – Dynamics Adapted Imitation Learning (DYNAIL), which incorporates the dynamics variation into the state-action occupancy measure matching as a regularization term. The dynamics variation is modeled by a pair of classifiers to distinguish between source dynamics and target dynamics. Theoretically, we provide an upper bound on the divergence between the learned policy and expert demonstrations in the source domain. Our error bound only depends on the expectation of the discrepancy between the source and target dynamics for the optimal policy in the target domain. The experiment evaluation validates that our method achieves superior results on high dimensional continuous control tasks, compared to existing imitation learning methods

NeurIPS Conference 2023 Conference Paper

Importance-aware Co-teaching for Offline Model-based Optimization

  • Ye Yuan
  • Can (Sam) Chen
  • Zixuan Liu
  • Willie Neiswanger
  • Xue (Steve) Liu

Offline model-based optimization aims to find a design that maximizes a property of interest using only an offline dataset, with applications in robot, protein, and molecule design, among others. A prevalent approach is gradient ascent, where a proxy model is trained on the offline dataset and then used to optimize the design. This method suffers from an out-of-distribution issue, where the proxy is not accurate for unseen designs. To mitigate this issue, we explore using a pseudo-labeler to generate valuable data for fine-tuning the proxy. Specifically, we propose $\textit{\textbf{I}mportance-aware \textbf{C}o-\textbf{T}eaching for Offline Model-based Optimization}~(\textbf{ICT})$. This method maintains three symmetric proxies with their mean ensemble as the final proxy, and comprises two steps. The first step is $\textit{pseudo-label-driven co-teaching}$. In this step, one proxy is iteratively selected as the pseudo-labeler for designs near the current optimization point, generating pseudo-labeled data. Subsequently, a co-teaching process identifies small-loss samples as valuable data and exchanges them between the other two proxies for fine-tuning, promoting knowledge transfer. This procedure is repeated three times, with a different proxy chosen as the pseudo-labeler each time, ultimately enhancing the ensemble performance. To further improve accuracy of pseudo-labels, we perform a secondary step of $\textit{meta-learning-based sample reweighting}$, which assigns importance weights to samples in the pseudo-labeled dataset and updates them via meta-learning. ICT achieves state-of-the-art results across multiple design-bench tasks, achieving the best mean rank $3. 1$ and median rank $2$ among $15$ methods. Our source code can be accessed here.

ICLR Conference 2023 Conference Paper

Over-Training with Mixup May Hurt Generalization

  • Zixuan Liu
  • Ziqiao Wang
  • Hongyu Guo
  • Yongyi Mao

Mixup, which creates synthetic training instances by linearly interpolating random sample pairs, is a simple and yet effective regularization technique to boost the performance of deep models trained with SGD. In this work, we report a previously unobserved phenomenon in Mixup raining: on a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs, giving rise to a U-shaped generalization curve. This behavior is further aggravated when the size of original dataset is reduced. To help understand such a behavior of Mixup, we show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data. Via analyzing a least-square regression problem with a random feature model, we explain why noisy labels may cause the U-shaped curve to occur: Mixup improves generalization through fitting the clean patterns at the early training stage, but as training progresses, Mixup becomes over-fitting to the noise in the synthetic data. Extensive experiments are performed on a variety of benchmark datasets, validating this explanation.

NeurIPS Conference 2023 Conference Paper

Parallel-mentoring for Offline Model-based Optimization

  • Can (Sam) Chen
  • Christopher Beckham
  • Zixuan Liu
  • Xue (Steve) Liu
  • Chris Pal

We study offline model-based optimization to maximize a black-box objective function with a static dataset of designs and scores. These designs encompass a variety of domains, including materials, robots, DNA sequences, and proteins. A common approach trains a proxy on the static dataset and performs gradient ascent to obtain new designs. However, this often results in poor designs due to the proxy inaccuracies for out-of-distribution designs. Recent studies indicate that (a) gradient ascent with a mean ensemble of proxies generally outperforms simple gradient ascent, and (b) a trained proxy provides weak ranking supervision signals for design selection. Motivated by (a) and (b), we propose $\textit{parallel-mentoring}$ as an effective and novel method that facilitates mentoring among proxies, creating a more robust ensemble to mitigate the out-of-distribution issue. We focus on the three-proxy case in the main paper and our method consists of two modules. The first module, $\textit{voting-based pairwise supervision}$, operates on three parallel proxies and captures their ranking supervision signals as pairwise comparison labels. These labels are combined through majority voting to generate consensus labels, which incorporates ranking supervision signals from all proxies and enables mutual mentoring. Yet, label noise arises due to possible incorrect consensus. To alleviate this, we introduce an $\textit{adaptive soft-labeling}$ module with soft-labels initialized as consensus labels. Based on bi-level optimization, this module fine-tunes proxies in the inner level and learns more accurate labels in the outer level to adaptively mentor proxies, resulting in a more robust ensemble. Experiments validate the effectiveness of our method. Our code is available here.