Arrow Research search

Author name cluster

Chenyu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

AAAI Conference 2026 Conference Paper

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

  • Chun-Hsiao Yeh
  • Chenyu Wang
  • Shengbang Tong
  • Ta-Ying Cheng
  • Ruoyu Wang
  • Tianzhe Chu
  • Yuexiang Zhai
  • Yubei Chen

Multi-view understanding, the ability to reconcile visual information across diverse viewpoints for effective navigation, manipulation, and 3D scene comprehension, is a fundamental challenge in Multi-Modal Large Language Models (MLLMs) to be used as embodied agents. While recent MLLMs have shown impressive advances in high-level reasoning and planning, they frequently fall short when confronted with multi-view geometric consistency and cross-view correspondence. To comprehensively evaluate the challenges of MLLMs in multi-view scene reasoning, we introduce All-Angles Bench, a human carefully benchmark with over 2,100 question-answer pairs from 90 diverse, real-world scenes. Our broad evaluation across 38 general-purpose and 3D spatial reasoning MLLMs reveals a substantial performance gap compared to humans. More critically, our analysis identifies two root failure modes: (1) cross-view object mismatch—the inability to establish consistent object correspondence across views; and (2) cross-view spatial misalignment—the failure to infer accurate camera poses and spatial layouts. These findings underscore a lack of multi-view awareness in current MLLMs, calling for architectural innovations beyond prompt tuning alone. We believe that our benchmark offers valuable insights toward building spatially-intelligent MLLMs.

ECAI Conference 2025 Conference Paper

Deep Learning and Explainable AI: New Pathways to Genetic Insights

  • Chenyu Wang
  • Chaoying Zuo
  • Zihan Su
  • Yuhang Xing
  • Lu Li
  • Maojun Wang
  • Zeyu Zhang 0004

Deep learning-based AI models have been extensively applied in genomics, achieving remarkable success across diverse applications. As these models gain prominence, there exists an urgent need for interpretability methods to establish trustworthiness in model-driven decisions. For genetic researchers, interpretable insights derived from these models hold significant value in providing novel perspectives for understanding biological processes. Current interpretability analyses in genomics predominantly rely on intuition and experience rather than rigorous theoretical foundations. In this review, we categorize interpretability methods into input-based and model-based approaches, while critically evaluating their limitations through concrete biological application scenarios. Furthermore, we establish theoretical underpinnings to elucidate the origins of these constraints through formal mathematical demonstrations, aiming to assist genetic researchers in better understanding and designing models in the future. Finally, we provide feasible suggestions for future research on interpretability in the field of genetics.

NeurIPS Conference 2025 Conference Paper

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding

  • Xiner Li
  • Yulai Zhao
  • Chenyu Wang
  • Gabriele Scalia
  • Gokcen Eraslan
  • Surag Nair
  • Tommaso Biancalani
  • Shuiwang Ji

Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require differentiable proxy models (e. g. , classifier guidance or DPS) or involve computationally expensive fine-tuning of diffusion models (e. g. , classifier-free guidance, RL-based fine-tuning). In our work, we propose a new method to address these challenges. Our algorithm is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of our algorithm across several domains, including image generation, molecule generation, and DNA/RNA sequence generation.

IROS Conference 2025 Conference Paper

DL-Clip: Online D-Learning with Clipping Operation for Fast Model-Free Stabilizing Control

  • Jingxuan Liu
  • Chenyu Wang
  • Zhaolong Shen
  • Quan Quan

In this paper, we present DL-Clip, an innovative online learning approach for nonlinear stabilizing control that operates without prior knowledge of system dynamics or reward signals, while significantly improving training efficiency. DL-Clip introduces a novel integration of stabilizing control with efficient Reinforcement Learning (RL) training mechanisms. The algorithm uses Lyapunov functions to ensure system stability and employs clipping operations to optimize policy updates, achieving faster convergence. We evaluate the effectiveness of DL-Clip through experiments, including simulations of the inverted pendulum and the Image-Based Visual Servoing (IBVS) for multicopter position stabilization. In addition, we validate the approach through a real flight experiment based on the IBVS problem, demonstrating its practical applicability.

NeurIPS Conference 2025 Conference Paper

GLID$^2$E: A Gradient-Free Lightweight Fine-tune Approach for Discrete Biological Sequence Design

  • Hanqun Cao
  • Haosen Shi
  • Chenyu Wang
  • Sinno Pan
  • Pheng-Ann Heng

The design of biological sequences is essential for engineering functional biomolecules that contribute to advancements in human health and biotechnology. Recent advances in diffusion models, with their generative power and efficient conditional sampling, have made them a promising approach for sequence generation. To enhance model performance on limited data and enable multi-objective design and optimization, reinforcement learning (RL)-based fine-tuning has shown great potential. However, existing post-sampling and fine-tuning methods either lack stability in discrete optimization when avoiding gradients or incur high computational costs when employing gradient-based approaches, creating significant challenges for achieving both control and stability in the tuning process. To address these limitations, we propose GLID$^2$E, a gradient-free RL-based tuning approach for discrete diffusion models. Our method introduces a clipped likelihood constraint to regulate the exploration space and implements reward shaping to better align the generative process with design objectives, ensuring a more stable and efficient tuning process. By integrating these techniques, GLID$^2$E mitigates training instabilities commonly encountered in RL and diffusion-based frameworks, enabling robust optimization even in challenging biological design tasks. In the DNA sequence and protein sequence design systems, GLID$^2$E achieves competitive performance in function-based design while maintaining computational efficiency and a flexible tuning mechanism.

NeurIPS Conference 2025 Conference Paper

Learning Diffusion Models with Flexible Representation Guidance

  • Chenyu Wang
  • Cai Zhou
  • Sharut Gupta
  • Johnson Lin
  • Stefanie Jegelka
  • Stephen Bates
  • Tommi Jaakkola

Diffusion models can be improved with additional guidance towards more effective representations of input. Indeed, prior empirical work has already shown that aligning internal representations of the diffusion model with those of pre-trained models improves generation quality. In this paper, we present a systematic framework for incorporating representation guidance into diffusion models. We provide alternative decompositions of denoising models along with their associated training criteria, where the decompositions determine when and how the auxiliary representations are incorporated. Guided by our theoretical insights, we introduce two new strategies for enhancing representation alignment in diffusion models. First, we pair examples with target representations either derived from themselves or arisen from different synthetic modalities, and subsequently learn a joint model over the multimodal pairs. Second, we design an optimal training curriculum that balances representation learning and data generation. Our experiments across image, protein sequence, and molecule generation tasks demonstrate superior performance as well as accelerated training. In particular, on the class-conditional ImageNet $256\times 256$ benchmark, our guidance results in $23. 3$ times faster training than the original SiT-XL as well as four times speedup over the state-of-the-art method REPA.

NeurIPS Conference 2025 Conference Paper

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

  • Cai Zhou
  • Chenyu Wang
  • Dinghuai Zhang
  • Shangyuan Tong
  • Yifei Wang
  • Stephen Bates
  • Tommi Jaakkola

In this paper we introduce Hierarchical Diffusion Language Models (HDLM) -- a novel family of discrete diffusion models for language modeling. HDLM builds on a hierarchical vocabulary where low-level tokens with detailed semantics are surjectively mapped to high-level tokens with coarse-grained meanings. In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler, while in the reverse process the model progressively predicts the next, more detailed semantics. Taken together, HDLM provides a general time-varying next semantic scale prediction process for language modeling. We derive closed-form expressions for the diffusion Evidence Lower Bound (ELBO), and show that HDLM can be implemented in a flexible manner while including the existing MDLM as a special case. We also propose practical training techniques based on the insights. Extensive text generation experiments validate the effectiveness of HDLM, which demonstrates consistently lower validation and generative perplexity than baselines.

IROS Conference 2024 Conference Paper

A 6-DOF Double-layer Programmable Remote Center of Motion Robot for Vitreoretinal Surgery

  • Chenyu Wang
  • Seong Young Ko

During vitreoretinal surgery, surgeons are required to precisely manipulate surgical tools within a confined workspace of an eye, which is roughly 2. 5 cm spherical in shape. Because the surgical view can only be obtained by a microscope placed above the eyeball through the pupil, the eyeball needs to be moved or rotated during the operation to see a larger portion of the retina. At this point, general Remote Center of Motion (RCM) mechanisms require additional actuators or manual modification. On the other hand, a programmable RCM mechanism can reduce surgery time without a physical alignment procedure. This study introduces a novel six-degree-of-freedom (DoF) programmable RCM mechanism capable of generating the RCM at random positions in 3D space. Our approach combines two planar 5-bar linkage mechanisms placed in parallel, creating a double-layered configuration to establish the programmable RCM mechanism. We optimized the workspace of each planar mechanism to a customized workspace for a general eyeball model using genetic algorithms, focusing on maximizing the manipulability of the target workspace. The Phantom Omni device was utilized as a remote controller to remotely control the proposed mechanism in a transparent eyeball model with a diameter of 4 cm. Evaluation of the functionality of the programmable RCM mechanism at various RCM points showed that the overall error was less than 1 millimeter. The repeatability of the mechanism was tested and showed an accuracy of about 127 micrometers.

ICML Conference 2024 Conference Paper

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

  • Raunaq M. Bhirangi
  • Chenyu Wang
  • Venkatesh Pattabiraman
  • Carmel Majidi
  • Abhinav Gupta 0001
  • Tess Lee Hellebrekers
  • Lerrel Pinto

Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e. g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e. g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e. g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https: //hiss-csp. github. io.

NeurIPS Conference 2024 Conference Paper

In-Context Symmetries: Self-Supervised Learning through Contextual World Models

  • Sharut Gupta
  • Chenyu Wang
  • Yifei Wang
  • Tommi Jaakkola
  • Stefanie Jegelka

At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context --- a memory module that tracks task-specific states, actions and future states. Here, the action is the transformation, while the current and future states respectively represent the input's representation before and after the transformation. Our proposed algorithm, Contextual Self Supervised Learning (ContextSSL), learns equivariance to all transformations (as opposed to invariance). In this way, the model can learn to encode all relevant features as general representations while having the versatility to tail down to task-wise symmetries when given a few examples as the context. Empirically, we demonstrate significant performance gains over existing methods on equivariance-related tasks, supported by both qualitative and quantitative evaluations.

IROS Conference 2022 Conference Paper

Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database

  • Jiaxin Wei 0001
  • Lan Hu
  • Chenyu Wang
  • Laurent Kneip

We present a new solution to the fine-grained retrieval of clean CAD models from a large-scale database in order to recover detailed object shape geometries for RGBD scans. Unlike previous work simply indexing into a moderately small database using an object shape descriptor and accepting the top retrieval result, we argue that in the case of a large-scale database a more accurate model may be found within a neighborhood of the descriptor. More importantly, we propose that the distinctiveness deficiency of shape descriptors at the instance level can be compensated by a geometry-based re-ranking of its neighborhood. Our approach first leverages the discriminative power of learned representations to distinguish between different categories of models and then uses a novel robust point set distance metric to re-rank the CAD neighbor-hood, enabling fine-grained retrieval in a large shape database. Evaluation on a real-world dataset shows that our geometry-based re-ranking is a conceptually simple but highly effective method that can lead to a significant improvement in retrieval accuracy compared to the state-of-the-art.

AAAI Conference 2022 Conference Paper

HAGEN: Homophily-Aware Graph Convolutional Recurrent Network for Crime Forecasting

  • Chenyu Wang
  • Zongyu Lin
  • Xiaochen Yang
  • Jiao Sun
  • Mingxuan Yue
  • Cyrus Shahabi

The goal of the crime forecasting problem is to predict different types of crimes for each geographical region (like a neighborhood or censor tract) in the near future. Since nearby regions usually have similar socioeconomic characteristics which indicate similar crime patterns, recent state-of-the-art solutions constructed a distance-based region graph and utilized Graph Neural Network (GNN) techniques for crime forecasting, because the GNN techniques could effectively exploit the latent relationships between neighboring region nodes in the graph if the edges reveal high dependency or correlation. However, this distance-based pre-defined graph cannot fully capture crime correlation between regions that are far from each other but share similar crime patterns. Hence, to make a more accurate crime prediction, the main challenge is to learn a better graph that reveals the dependencies between regions in crime occurrences and meanwhile captures the temporal patterns from historical crime records. To address these challenges, we propose an end-toend graph convolutional recurrent network called HAGEN with several novel designs for crime prediction. Specifically, our framework could jointly capture the crime correlation between regions and the temporal crime dynamics by combining an adaptive region graph learning module with the Diffusion Convolution Gated Recurrent Unit (DCGRU). Based on the homophily assumption of GNN (i. e. , graph convolution works better where neighboring nodes share the same label), we propose a homophily-aware constraint to regularize the optimization of the region graph so that neighboring region nodes on the learned graph share similar crime patterns, thus fitting the mechanism of diffusion convolution. Empirical experiments and comprehensive analysis on two real-world datasets showcase the effectiveness of HAGEN.

JBHI Journal 2022 Journal Article

Multiple Sclerosis Lesion Analysis in Brain Magnetic Resonance Images: Techniques and Clinical Applications

  • Yang Ma
  • Chaoyi Zhang
  • Mariano Cabezas
  • Yang Song
  • Zihao Tang
  • Dongnan Liu
  • Weidong Cai
  • Michael Barnett

Multiple sclerosis (MS) is a chronic inflammatory and degenerative disease of the central nervous system, characterized by the appearance of focal lesions in the white and gray matter that topographically correlate with an individual patient’s neurological symptoms and signs. Magnetic resonance imaging (MRI) provides detailed in-vivo structural information, permitting the quantification and categorization of MS lesions that critically inform disease management. Traditionally, MS lesions have been manually annotated on 2D MRI slices, a process that is inefficient and prone to inter-/intra-observer errors. Recently, automated statistical imaging analysis techniques have been proposed to detect and segment MS lesions based on MRI voxel intensity. However, their effectiveness is limited by the heterogeneity of both MRI data acquisition techniques and the appearance of MS lesions. By learning complex lesion representations directly from images, deep learning techniques have achieved remarkable breakthroughs in the MS lesion segmentation task. Here, we provide a comprehensive review of state-of-the-art automatic statistical and deep-learning MS segmentation methods and discuss current and future clinical applications. Further, we review technical strategies, such as domain adaptation, to enhance MS lesion segmentation in real-world clinical settings.