Author name cluster

Chenyu Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

AAAI Conference 2026 Conference Paper

Seeing from Another Perspective: Evaluating Multi-View Understanding in MLLMs

Chun-Hsiao Yeh
Chenyu Wang
Shengbang Tong
Ta-Ying Cheng
Ruoyu Wang
Tianzhe Chu
Yuexiang Zhai
Yubei Chen

Multi-view understanding, the ability to reconcile visual information across diverse viewpoints for effective navigation, manipulation, and 3D scene comprehension, is a fundamental challenge in Multi-Modal Large Language Models (MLLMs) to be used as embodied agents. While recent MLLMs have shown impressive advances in high-level reasoning and planning, they frequently fall short when confronted with multi-view geometric consistency and cross-view correspondence. To comprehensively evaluate the challenges of MLLMs in multi-view scene reasoning, we introduce All-Angles Bench, a human carefully benchmark with over 2,100 question-answer pairs from 90 diverse, real-world scenes. Our broad evaluation across 38 general-purpose and 3D spatial reasoning MLLMs reveals a substantial performance gap compared to humans. More critically, our analysis identifies two root failure modes: (1) cross-view object mismatch—the inability to establish consistent object correspondence across views; and (2) cross-view spatial misalignment—the failure to infer accurate camera poses and spatial layouts. These findings underscore a lack of multi-view awareness in current MLLMs, calling for architectural innovations beyond prompt tuning alone. We believe that our benchmark offers valuable insights toward building spatially-intelligent MLLMs.

PDF Details DOI

ECAI Conference 2025 Conference Paper

Deep Learning and Explainable AI: New Pathways to Genetic Insights

Chenyu Wang
Chaoying Zuo
Zihan Su
Yuhang Xing
Lu Li
Maojun Wang
Zeyu Zhang 0004

Deep learning-based AI models have been extensively applied in genomics, achieving remarkable success across diverse applications. As these models gain prominence, there exists an urgent need for interpretability methods to establish trustworthiness in model-driven decisions. For genetic researchers, interpretable insights derived from these models hold significant value in providing novel perspectives for understanding biological processes. Current interpretability analyses in genomics predominantly rely on intuition and experience rather than rigorous theoretical foundations. In this review, we categorize interpretability methods into input-based and model-based approaches, while critically evaluating their limitations through concrete biological application scenarios. Furthermore, we establish theoretical underpinnings to elucidate the origins of these constraints through formal mathematical demonstrations, aiming to assist genetic researchers in better understanding and designing models in the future. Finally, we provide feasible suggestions for future research on interpretability in the field of genetics.

Details

NeurIPS Conference 2025 Conference Paper

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-based Decoding

Xiner Li
Yulai Zhao
Chenyu Wang
Gabriele Scalia
Gokcen Eraslan
Surag Nair
Tommaso Biancalani
Shuiwang Ji

Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require differentiable proxy models (e. g. , classifier guidance or DPS) or involve computationally expensive fine-tuning of diffusion models (e. g. , classifier-free guidance, RL-based fine-tuning). In our work, we propose a new method to address these challenges. Our algorithm is an iterative sampling method that integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future, into the standard inference procedure of pre-trained diffusion models. Notably, our approach avoids fine-tuning generative models and eliminates the need to construct differentiable models. This enables us to (1) directly utilize non-differentiable features/reward feedback, commonly used in many scientific domains, and (2) apply our method to recent discrete diffusion models in a principled way. Finally, we demonstrate the effectiveness of our algorithm across several domains, including image generation, molecule generation, and DNA/RNA sequence generation.

PDF Details

IROS Conference 2025 Conference Paper

DL-Clip: Online D-Learning with Clipping Operation for Fast Model-Free Stabilizing Control

Jingxuan Liu
Chenyu Wang
Zhaolong Shen
Quan Quan

In this paper, we present DL-Clip, an innovative online learning approach for nonlinear stabilizing control that operates without prior knowledge of system dynamics or reward signals, while significantly improving training efficiency. DL-Clip introduces a novel integration of stabilizing control with efficient Reinforcement Learning (RL) training mechanisms. The algorithm uses Lyapunov functions to ensure system stability and employs clipping operations to optimize policy updates, achieving faster convergence. We evaluate the effectiveness of DL-Clip through experiments, including simulations of the inverted pendulum and the Image-Based Visual Servoing (IBVS) for multicopter position stabilization. In addition, we validate the approach through a real flight experiment based on the IBVS problem, demonstrating its practical applicability.

Details

NeurIPS Conference 2025 Conference Paper

GLID$^2$E: A Gradient-Free Lightweight Fine-tune Approach for Discrete Biological Sequence Design

Hanqun Cao
Haosen Shi
Chenyu Wang
Sinno Pan
Pheng-Ann Heng

The design of biological sequences is essential for engineering functional biomolecules that contribute to advancements in human health and biotechnology. Recent advances in diffusion models, with their generative power and efficient conditional sampling, have made them a promising approach for sequence generation. To enhance model performance on limited data and enable multi-objective design and optimization, reinforcement learning (RL)-based fine-tuning has shown great potential. However, existing post-sampling and fine-tuning methods either lack stability in discrete optimization when avoiding gradients or incur high computational costs when employing gradient-based approaches, creating significant challenges for achieving both control and stability in the tuning process. To address these limitations, we propose GLID$^2$E, a gradient-free RL-based tuning approach for discrete diffusion models. Our method introduces a clipped likelihood constraint to regulate the exploration space and implements reward shaping to better align the generative process with design objectives, ensuring a more stable and efficient tuning process. By integrating these techniques, GLID$^2$E mitigates training instabilities commonly encountered in RL and diffusion-based frameworks, enabling robust optimization even in challenging biological design tasks. In the DNA sequence and protein sequence design systems, GLID$^2$E achieves competitive performance in function-based design while maintaining computational efficiency and a flexible tuning mechanism.

PDF Details

NeurIPS Conference 2025 Conference Paper

Learning Diffusion Models with Flexible Representation Guidance

Chenyu Wang
Cai Zhou
Sharut Gupta
Johnson Lin
Stefanie Jegelka
Stephen Bates
Tommi Jaakkola

Diffusion models can be improved with additional guidance towards more effective representations of input. Indeed, prior empirical work has already shown that aligning internal representations of the diffusion model with those of pre-trained models improves generation quality. In this paper, we present a systematic framework for incorporating representation guidance into diffusion models. We provide alternative decompositions of denoising models along with their associated training criteria, where the decompositions determine when and how the auxiliary representations are incorporated. Guided by our theoretical insights, we introduce two new strategies for enhancing representation alignment in diffusion models. First, we pair examples with target representations either derived from themselves or arisen from different synthetic modalities, and subsequently learn a joint model over the multimodal pairs. Second, we design an optimal training curriculum that balances representation learning and data generation. Our experiments across image, protein sequence, and molecule generation tasks demonstrate superior performance as well as accelerated training. In particular, on the class-conditional ImageNet $256\times 256$ benchmark, our guidance results in $23. 3$ times faster training than the original SiT-XL as well as four times speedup over the state-of-the-art method REPA.

PDF Details

NeurIPS Conference 2025 Conference Paper

Next Semantic Scale Prediction via Hierarchical Diffusion Language Models

Cai Zhou
Chenyu Wang
Dinghuai Zhang
Shangyuan Tong
Yifei Wang
Stephen Bates
Tommi Jaakkola

In this paper we introduce Hierarchical Diffusion Language Models (HDLM) -- a novel family of discrete diffusion models for language modeling. HDLM builds on a hierarchical vocabulary where low-level tokens with detailed semantics are surjectively mapped to high-level tokens with coarse-grained meanings. In the forward process, each token is independently perturbed to its higher-level ancestor with more abstract semantics according to the scheduler, while in the reverse process the model progressively predicts the next, more detailed semantics. Taken together, HDLM provides a general time-varying next semantic scale prediction process for language modeling. We derive closed-form expressions for the diffusion Evidence Lower Bound (ELBO), and show that HDLM can be implemented in a flexible manner while including the existing MDLM as a special case. We also propose practical training techniques based on the insights. Extensive text generation experiments validate the effectiveness of HDLM, which demonstrates consistently lower validation and generative perplexity than baselines.

PDF Details

IROS Conference 2024 Conference Paper

A 6-DOF Double-layer Programmable Remote Center of Motion Robot for Vitreoretinal Surgery

Chenyu Wang
Seong Young Ko

During vitreoretinal surgery, surgeons are required to precisely manipulate surgical tools within a confined workspace of an eye, which is roughly 2. 5 cm spherical in shape. Because the surgical view can only be obtained by a microscope placed above the eyeball through the pupil, the eyeball needs to be moved or rotated during the operation to see a larger portion of the retina. At this point, general Remote Center of Motion (RCM) mechanisms require additional actuators or manual modification. On the other hand, a programmable RCM mechanism can reduce surgery time without a physical alignment procedure. This study introduces a novel six-degree-of-freedom (DoF) programmable RCM mechanism capable of generating the RCM at random positions in 3D space. Our approach combines two planar 5-bar linkage mechanisms placed in parallel, creating a double-layered configuration to establish the programmable RCM mechanism. We optimized the workspace of each planar mechanism to a customized workspace for a general eyeball model using genetic algorithms, focusing on maximizing the manipulability of the target workspace. The Phantom Omni device was utilized as a remote controller to remotely control the proposed mechanism in a transparent eyeball model with a diameter of 4 cm. Evaluation of the functionality of the programmable RCM mechanism at various RCM points showed that the overall error was less than 1 millimeter. The repeatability of the mechanism was tested and showed an accuracy of about 127 micrometers.

Details

ICML Conference 2024 Conference Paper

Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

Raunaq M. Bhirangi
Chenyu Wang
Venkatesh Pattabiraman
Carmel Majidi
Abhinav Gupta 0001
Tess Lee Hellebrekers
Lerrel Pinto

Reasoning from sequences of raw sensory data is a ubiquitous problem across fields ranging from medical devices to robotics. These problems often involve using long sequences of raw sensor data (e. g. magnetometers, piezoresistors) to predict sequences of desirable physical quantities (e. g. force, inertial measurements). While classical approaches are powerful for locally-linear prediction problems, they often fall short when using real-world sensors. These sensors are typically non-linear, are affected by extraneous variables (e. g. vibration), and exhibit data-dependent drift. For many problems, the prediction task is exacerbated by small labeled datasets since obtaining ground-truth labels requires expensive equipment. In this work, we present Hierarchical State-Space models (HiSS), a conceptually simple, new technique for continuous sequential prediction. HiSS stacks structured state-space models on top of each other to create a temporal hierarchy. Across six real-world sensor datasets, from tactile-based state prediction to accelerometer-based inertial measurement, HiSS outperforms state-of-the-art sequence models such as causal Transformers, LSTMs, S4, and Mamba by at least 23% on MSE. Our experiments further indicate that HiSS demonstrates efficient scaling to smaller datasets and is compatible with existing data-filtering techniques. Code, datasets and videos can be found on https: //hiss-csp. github. io.

Details

AIIM Journal 2024 Journal Article

Improving multiple sclerosis lesion segmentation across clinical sites: A federated learning approach with noise-resilient training

Lei Bai
Dongang Wang
Hengrui Wang
Michael Barnett
Mariano Cabezas
Weidong Cai
Fernando Calamante
Kain Kyle

Details DOI

NeurIPS Conference 2024 Conference Paper

In-Context Symmetries: Self-Supervised Learning through Contextual World Models

Sharut Gupta
Chenyu Wang
Yifei Wang
Tommi Jaakkola
Stefanie Jegelka

At the core of self-supervised learning for vision is the idea of learning invariant or equivariant representations with respect to a set of data transformations. This approach, however, introduces strong inductive biases, which can render the representations fragile in downstream tasks that do not conform to these symmetries. In this work, drawing insights from world models, we propose to instead learn a general representation that can adapt to be invariant or equivariant to different transformations by paying attention to context --- a memory module that tracks task-specific states, actions and future states. Here, the action is the transformation, while the current and future states respectively represent the input's representation before and after the transformation. Our proposed algorithm, Contextual Self Supervised Learning (ContextSSL), learns equivariance to all transformations (as opposed to invariance). In this way, the model can learn to encode all relevant features as general representations while having the versatility to tail down to task-wise symmetries when given a few examples as the context. Empirically, we demonstrate significant performance gains over existing methods on equivariance-related tasks, supported by both qualitative and quantitative evaluations.

PDF Details DOI

IROS Conference 2022 Conference Paper

Accurate Instance-Level CAD Model Retrieval in a Large-Scale Database

Jiaxin Wei 0001
Lan Hu
Chenyu Wang
Laurent Kneip

We present a new solution to the fine-grained retrieval of clean CAD models from a large-scale database in order to recover detailed object shape geometries for RGBD scans. Unlike previous work simply indexing into a moderately small database using an object shape descriptor and accepting the top retrieval result, we argue that in the case of a large-scale database a more accurate model may be found within a neighborhood of the descriptor. More importantly, we propose that the distinctiveness deficiency of shape descriptors at the instance level can be compensated by a geometry-based re-ranking of its neighborhood. Our approach first leverages the discriminative power of learned representations to distinguish between different categories of models and then uses a novel robust point set distance metric to re-rank the CAD neighbor-hood, enabling fine-grained retrieval in a large shape database. Evaluation on a real-world dataset shows that our geometry-based re-ranking is a conceptually simple but highly effective method that can lead to a significant improvement in retrieval accuracy compared to the state-of-the-art.

Details

AAAI Conference 2022 Conference Paper

HAGEN: Homophily-Aware Graph Convolutional Recurrent Network for Crime Forecasting

Chenyu Wang
Zongyu Lin
Xiaochen Yang
Jiao Sun
Mingxuan Yue
Cyrus Shahabi

The goal of the crime forecasting problem is to predict different types of crimes for each geographical region (like a neighborhood or censor tract) in the near future. Since nearby regions usually have similar socioeconomic characteristics which indicate similar crime patterns, recent state-of-the-art solutions constructed a distance-based region graph and utilized Graph Neural Network (GNN) techniques for crime forecasting, because the GNN techniques could effectively exploit the latent relationships between neighboring region nodes in the graph if the edges reveal high dependency or correlation. However, this distance-based pre-defined graph cannot fully capture crime correlation between regions that are far from each other but share similar crime patterns. Hence, to make a more accurate crime prediction, the main challenge is to learn a better graph that reveals the dependencies between regions in crime occurrences and meanwhile captures the temporal patterns from historical crime records. To address these challenges, we propose an end-toend graph convolutional recurrent network called HAGEN with several novel designs for crime prediction. Specifically, our framework could jointly capture the crime correlation between regions and the temporal crime dynamics by combining an adaptive region graph learning module with the Diffusion Convolution Gated Recurrent Unit (DCGRU). Based on the homophily assumption of GNN (i. e. , graph convolution works better where neighboring nodes share the same label), we propose a homophily-aware constraint to regularize the optimization of the region graph so that neighboring region nodes on the learned graph share similar crime patterns, thus fitting the mechanism of diffusion convolution. Empirical experiments and comprehensive analysis on two real-world datasets showcase the effectiveness of HAGEN.

PDF Details

JBHI Journal 2022 Journal Article

Multiple Sclerosis Lesion Analysis in Brain Magnetic Resonance Images: Techniques and Clinical Applications

Yang Ma
Chaoyi Zhang
Mariano Cabezas
Yang Song
Zihao Tang
Dongnan Liu
Weidong Cai
Michael Barnett

Multiple sclerosis (MS) is a chronic inflammatory and degenerative disease of the central nervous system, characterized by the appearance of focal lesions in the white and gray matter that topographically correlate with an individual patient’s neurological symptoms and signs. Magnetic resonance imaging (MRI) provides detailed in-vivo structural information, permitting the quantification and categorization of MS lesions that critically inform disease management. Traditionally, MS lesions have been manually annotated on 2D MRI slices, a process that is inefficient and prone to inter-/intra-observer errors. Recently, automated statistical imaging analysis techniques have been proposed to detect and segment MS lesions based on MRI voxel intensity. However, their effectiveness is limited by the heterogeneity of both MRI data acquisition techniques and the appearance of MS lesions. By learning complex lesion representations directly from images, deep learning techniques have achieved remarkable breakthroughs in the MS lesion segmentation task. Here, we provide a comprehensive review of state-of-the-art automatic statistical and deep-learning MS segmentation methods and discuss current and future clinical applications. Further, we review technical strategies, such as domain adaptation, to enhance MS lesion segmentation in real-world clinical settings.

Details DOI

YNICL Journal 2021 Journal Article

Brain atrophy and lesion burden are associated with disability progression in a multiple sclerosis real-world dataset using only T2-FLAIR: The NeuroSTREAM MSBase study

Michael Barnett
Niels Bergsland
Bianca Weinstock-Guttman
Helmut Butzkueven
Tomas Kalincik
Patricia Desmond
Frank Gaillard
Vincent Van Pesch

Details DOI

YNICL Journal 2018 Journal Article

Evidence of progressive tissue loss in the core of chronic MS lesions: A longitudinal DTI study

Alexander Klistorner
Chenyu Wang
Con Yiannikas
John Parratt
Michael Dwyer
Joshua Barton
Stuart L. Graham
Yuyi You

Details DOI

YNICL Journal 2017 Journal Article

MRI FLAIR lesion segmentation in multiple sclerosis: Does automated segmentation hold up with manual annotation?

Christine Egger
Roland Opfer
Chenyu Wang
Timo Kepp
Maria Pia Sormani
Lothar Spies
Michael Barnett
Sven Schippling

Details DOI

YNICL Journal 2016 Journal Article

Diffusivity in multiple sclerosis lesions: At the cutting edge?

Alexander Klistorner
Chenyu Wang
Vera Fofanova
Michael H. Barnett
Con Yiannikas
John Parratt
Yuyi You
Stuart L. Graham

Details DOI