Author name cluster

Xiao Liang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers

2 author rows

AAAI Conference 2026 Conference Paper

Anatomical Region-Guided Contrastive Decoding: A Plug-and-Play Strategy for Mitigating Hallucinations in Medical VLMs

Xiao Liang
Chenxi Liu
Zhi Ma
Di Wang
Bin Jing
Quan Wang
Yuanyuan Shi

Medical Vision-Language Models (MedVLMs) show immense promise in clinical applicability. However, their reliability is hindered by hallucinations, where models often fail to derive answers from visual evidence, instead relying on learned textual priors. Existing mitigation strategies for MedVLMs have distinct limitations: training-based methods rely on costly expert annotations, limiting scalability, while training-free interventions like contrastive decoding, though data-efficient, apply a global, untargeted correction whose effects in complex real-world clinical settings can be unreliable. To address these challenges, we introduce Anatomical Region-Guided Contrastive Decoding (ARCD), a plug-and-play strategy that mitigates hallucinations by providing targeted, region-specific guidance. Our module leverages an anatomical mask to direct a three-tiered contrastive decoding process. By dynamically re-weighting at the token, attention, and logits levels, it verifiably steers the model's focus onto specified regions, reinforcing anatomical understanding and suppressing factually incorrect outputs. Extensive experiments across diverse datasets, including chest X-ray, CT, brain MRI, and ocular ultrasound, demonstrate our method's effectiveness in improving regional understanding, reducing hallucinations, and enhancing overall diagnostic accuracy.

PDF Details DOI

JBHI Journal 2026 Journal Article

MRLF-DDI: A Multi-View Representation Learning Framework for Drug-Drug Interaction Event Prediction

Jian Zhong
Haochen Zhao
Xiao Liang
Qichang Zhao
Jianxin Wang

Accurately predicting drug-drug interaction events (DDIEs) is critical for improving medication safety and guiding clinical decision-making. However, existing graph neural network (GNN)-based methods often struggle to effectively integrate multi-view features and generalize to novel or understudied drugs. To address these limitations, we propose MRLF-DDI, a multi-view representation learning framework that jointly models information from individual drug features, local interaction contexts, and global interaction patterns. MRLF-DDI introduces the use of atom-level structural features enriched with bond angle information—marking the first incorporation of this geometry-aware feature in DDIE prediction. It further employs a multi-granularity GNN and a gated knowledge transfer strategy to enhance feature learning and cold-start generalization. Extensive experiments on benchmark datasets demonstrate that MRLF-DDI achieves superior performance in both warm-start and cold-start scenarios. Case studies and visualization analyses further highlight its practical utility in identifying clinically relevant DDIEs.

Details DOI

JBHI Journal 2026 Journal Article

Topology-aware Diffusion Schrödinger Bridge for Unpaired H&E-to-IHC Stain Translation

Chujie Zhang
Yangyang Xie
Jihong Hu
Xiaoyu Shi
Yinhao Li
Xiao Liang
Lanfen Lin
Yen-Wei Chen

Unpaired H&E-to-IHC Stain Translation aims to generate immunohistochemistry (IHC) staining from Hematoxylin and Eosin (H&E) staining. It offers clearer diagnostic insights and potentially expands access to advanced pathology services in resource-limited areas. This task faces two primary challenges: capturing target domain style characteristics and preserving topological features in histological images. Recently, Schrödinger Bridge (SB)-based methods have offered a solution for unpaired image-to-image translation, addressing the mode collapse and artifact issues in CycleGAN-based approaches, as well as the Gaussian prior assumption limitation in diffusion-based methods. While SB-based methods suffer from the curse of dimensionality with high-resolution images, the Unpaired Neural Schrödinger Bridge (UNSB) overcomes this challenge and achieves state-of-the-art (SOTA) performance on natural images. However, UNSB has two key issues in histological images: (1) loss of topological features and (2) IHC staining representation. UNSB focuses only on the optimal path from source to target domains, ignoring local structure paths. Convolutional neural networks (CNNs) do not perfectly preserve critical anatomical structures due to limitations like receptive field size or model capacity. To address these challenges, we introduce the T opology-aware D iffusion S chrödinger B ridge ( TDSB ), integrating a Topology Guidance (TG) module and Dual-Domain Adaptive Patch-based noise contrastive estimation (DDAP). Experiments on seven translation tasks across three datasets show that our method achieves SOTA performance in unpaired H&E-to-IHC stain translation. Clinical evaluation through pathologists' assessments further validates the effectiveness of our method.

Details DOI

NeurIPS Conference 2025 Conference Paper

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

Hongyuan Dong
Dingkang Yang
Xiao Liang
Ran Jiao

Learning rate is widely regarded as crucial for effective foundation model pretraining. Recent research explores and demonstrates the transferability of learning rate configurations across varying model and dataset sizes, etc. Nevertheless, these approaches are constrained to specific training scenarios and typically necessitate extensive hyperparameter tuning on proxy models. In this work, we propose \textbf{AdaLRS}, a plug-in-and-play adaptive learning rate search algorithm that conducts online optimal learning rate search via optimizing loss descent velocities. We provide theoretical and experimental analyzes to show that foundation model pretraining loss and its descent velocity are both convex and share the same optimal learning rate. Relying solely on training loss dynamics, AdaLRS involves few extra computations to guide the search process, and its convergence is guaranteed via theoretical analysis. Experiments on both LLM and VLM pretraining show that AdaLRS adjusts suboptimal learning rates to the neighborhood of optimum with marked efficiency and effectiveness, with model performance improved accordingly. We also show the robust generalizability of AdaLRS across varying training scenarios, such as different model sizes, training paradigms, base learning rate scheduler choices, and hyperparameter settings.

PDF Details

JBHI Journal 2025 Journal Article

AGPred: An End-to-End Deep Learning Model for Predicting Drug Approvals in Clinical Trials Based on Molecular Features

Haochen Zhao
Xiao Liang
Chenliang Xie
Shaokai Wang

One of the major challenges in drug development is maintaining acceptable levels of efficacy and safety throughout the various stages of clinical trials and successfully bringing the drug to market. However, clinical trials are time-consuming and expensive. While there are computational methods designed to predict the likelihood of a drug passing clinical trials and reaching the market, these methods heavily rely on manual feature engineering and cannot automatically learn drug molecular representations, resulting in relatively low model performance. In this study, we propose AGPred, an attention-based deep Graph Neural Network (GNN) designed to predict drug approval rates in clinical trials accurately. Unlike the few existing studies on drug approval prediction, which only use predicted targets of compounds, our novel approach employs a GNN module to extract high-potential features of compounds based on their molecular graphs. Additionally, a cross-attention-based fusion module is utilized to learn molecular fingerprint features, enhancing the model's representation of chemical structures. Meanwhile, AGPred integrates the physicochemical properties of drugs to provide a comprehensive description of the molecules. Experimental results indicate that AGPred outperforms four state-of-the-art models on both benchmark and independent datasets. The study also includes several ablation experiments and visual analyses to demonstrate the effectiveness of our method in predicting drug approval during clinical trials.

Details DOI

ICRA Conference 2025 Conference Paper

AutoPeel: Adhesion-Aware Safe Peeling Trajectory Optimization for Robotic Wound Care

Xiao Liang
Youcheng Zhang
Fei Liu 0033
Florian Richter 0002
Michael C. Yip

Chronic wounds, including diabetic ulcers, pressure ulcers, and ulcers secondary to venous hypertension, affects more than 6. 5 million patients and a yearly cost of more than $25 billion in the United States alone. Chronic wound treatment is currently a manual process, and we envision a future where robotics and automation will aid in this treatment to reduce cost and improve patient care. In this work, we present the development of the first robotic system for wound dressing removal which is reported to be the worst aspect of living with chronic wounds. Our method leverages differentiable physics-based simulation to perform gradient-based trajectory optimization for peeling trajectory planning. By integrating fracture mechanics of adhesion, we are able to model the peeling effect inherent to dressing adhesion. The system is further guided by carefully designed objective functions that promote both efficient and safe control, reducing the risk of tissue damage. We validated the efficacy of our approach through a series of experiments conducted on both synthetic skin phantoms and real human subjects. Our results demonstrate the system's ability to achieve precise and safe dressing removal trajectories, offering a promising solution for automating this essential healthcare procedure.

Details

EAAI Journal 2025 Journal Article

ElecBench: A large language model benchmark in electric power domain

Sai Zhang
Qiaochu Huang
Qiang Zhang
Xiao Liang
Weiwei Liu
Kunlun Gao
Fei Zhou
Congcong Shi

Details DOI

JBHI Journal 2025 Journal Article

Image-enhanced Multi-Modal Contrastive Transformer for subcellular spatial transcriptomics

Wanwan Shi
Ying Liu
Qiu Xiao
Yuting Bai
Xiao Liang
Xinling Zeng
Chee Keong Kwoh
Jiawei Luo

Recent advances in spatial molecular imaging technologies have enabled gene expression profiling alongside high-resolution imaging, providing unprece dented opportunities to resolve molecular heterogeneity at subcellular resolution. However, these technologies fail to fully capture cellular characteristics due to the limited number of genes they can detect, which hinderdownstream analysis. Spatial imaging data provide high-resolution and fine-grained morphology information, developing computational methods that effectively integrate image features with transcriptomic profiles is crucial for enabling comprehensive subcellular data analysis. In this study, we present SIMMT, an image-enhanced multi-modal contrastivetrans former framework for identifying spatial domains and en hancing subcellular data. In the framework, we design a dual transformer architecture to learn multi-modal representations for cells by modeling transcriptomics and morphological images respectively. To fully capture modality interactions within spatial contexts, we introduce a contrastive learning module that enhances cell representation by aligning tissue morphology and gene expression at the cell level. We tested SIMMT on subcellular spatial transcriptomics datasets from human lung cancer tissue, mouse brain tissue, human colorectal cancer tissue, and human ovarian cancer tissue. The results demonstrated that SIMMT consistently outperformed state-of-the-art methods in spatial clustering and gene expression pattern analysis. Our method also effectively demonstrated its ability to identify tumor spatial heterogeneity and uncover potential gene biomarkers in the human bronchiolar adenoma (BA) dataset. The code and dataset of SIMMT can be downloaded from https://github.com/LWanzi/SIMMT

Details DOI

ICLR Conference 2025 Conference Paper

Integrative Decoding: Improving Factuality via Implicit Self-consistency

Yi Cheng
Xiao Liang
Yeyun Gong
Wen Xiao
Song Wang
Yuji Zhang 0002
Wenjun Hou
Kaishuai Xu

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.

Details

ICRA Conference 2025 Conference Paper

MEDiC: Autonomous Surgical Robotic Assistance to Maximizing Exposure for Dissection and Cautery

Xiao Liang
Chung-Pang Wang
Nikhil U. Shinde
Fei Liu 0033
Florian Richter 0002
Michael C. Yip

Surgical automation has the capability to improve the consistency of patient outcomes and broaden access to advanced surgical care in underprivileged communities. Shared autonomy, where the robot automates routine subtasks while the surgeon retains partial teleoperative control, offers great potential to make an impact. In this paper we focus on one important skill within surgical shared autonomy: Automating robotic assistance to maximize visual exposure and apply tissue tension for dissection and cautery. Ensuring consistent exposure to visualize the surgical site is crucial for both efficiency and patient safety. However, achieving this is highly challenging due to the complexities of manipulating deformable volumetric tissues that are prevalent in surgery. To address these challenges we propose MEDiC, a framework for autonomous surgical robotic assistance to Maximizing Exposure for Dissection and Cautery. We integrate a differentiable physics model with perceptual feedback to achieve our two key objectives: 1) Maximizing tissue exposure and applying tension for a specified dissection site through visual-servoing control and 2) Selecting optimal control positions for a dissection target based on deformable Jacobian analysis. We quantitatively assess our method through repeated real robot experiments on a tissue phantom. Our visual-servoing and optimal control position selection achieve success rate of 100% and 82% respectively in ablation study. We also showcase our framework's capabilities through dissection experiments using shared autonomy on real animal tissue.

Details

NeurIPS Conference 2025 Conference Paper

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

Xiao Liang
Zhong-Zhi Li
Yeyun Gong
Yang Wang
Hengyuan Zhang
Yelong Shen
Ying Nian Wu
Weizhu Chen

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks, such as mathematical problem solving. A prerequisite for the scalability of RLVR is a high-quality problem set with precise and verifiable answers. However, the scarcity of well-crafted human-labeled math problems and limited-verification answers in existing distillation-oriented synthetic datasets limit their effectiveness in RL. Additionally, most problem synthesis strategies indiscriminately expand the problem set without considering the model’s capabilities, leading to low efficiency in generating useful questions. To mitigate this issue, we introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation. Specifically, we define weaknesses as questions that the model consistently fails to learn through its iterative sampling during RL training. We then extract the core concepts from these failure cases and synthesize new problems to strengthen the model's weak areas in subsequent augmented training, enabling it to focus on and gradually overcome its weaknesses. Without relying on external knowledge distillation, our framework enables robust generalization by empowering the model to self-identify and address its weaknesses in RL, yielding average performance gains of 10% and 7. 7% on 7B and 32B models across eight mainstream reasoning benchmarks. Our code and data are available at https: //anonymous. 4open. science/r/SwS-E6F5/

PDF Details

ICRA Conference 2024 Conference Paper

Achieving Autonomous Cloth Manipulation with Optimal Control via Differentiable Physics-Aware Regularization and Safety Constraints

Yutong Zhang
Fei Liu 0033
Xiao Liang
Michael C. Yip

Cloth manipulation is a category of deformable object manipulation of great interest to the robotics community, from applications of automated laundry-folding and home organizing to textiles and flexible manufacturing. Despite the desire for automated cloth manipulation, the thin-shell dynamics and under-actuation nature of cloth present significant challenges for robots to effectively interact with them. Many recent works omit explicit modeling in favor of learning-based methods that may yield control policies directly. However, these methods require large training sets that must be collected and curated. In this regard, we create a framework for differentiable modeling of cloth dynamics leveraging an Extended Position-based Dynamics (XPBD) algorithm. Together with the desired control objective, physics-aware regularization terms are designed for better results, including trajectory smoothness and elastic potential energy. In addition, safety constraints, such as avoiding obstacles, can be specified using signed distance functions (SDFs). We formulate the cloth manipulation task with safety constraints as a constrained optimization problem, which can be effectively solved by mainstream gradient-based optimizers thanks to the end-to-end differentiability of our framework. Finally, we assess the framework with various safety thresholds and demonstrate the feasibility of result trajectories on a surgical robot. The effects of the regularization terms are analyzed in an additional ablation study.

Details

AAAI Conference 2024 Conference Paper

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding

Yaoyuan Liang
Xiao Liang
Yansong Tang
Zhao Yang
Ziran Li
Jingang Wang
Wenbo Ding
Shao-Lun Huang

This paper studies the spatio-temporal video grounding task, which aims to localize a spatio-temporal tube in an untrimmed video based on the given text description of an event. Existing one-stage approaches suffer from insufficient space-time interaction in two aspects: i) less precise prediction of event temporal boundaries, and ii) inconsistency in object prediction for the same event across adjacent frames. To address these issues, we propose a framework of Comprehensive Space-Time entAnglement (CoSTA) to densely entangle space-time multi-modal features for spatio-temporal localization. Specifically, we propose a space-time collaborative encoder to extract comprehensive video features and leverage Transformer to perform spatio-temporal multi-modal understanding. Our entangled decoder couples temporal boundary prediction and spatial localization via an entangled query, boasting an enhanced ability to capture object-event relationships. We conduct extensive experiments on the challenging benchmarks of HC-STVG and VidSTG, where CoSTA outperforms existing state-of-the-art methods, demonstrating its effectiveness for this task.

PDF Details DOI

ICRA Conference 2024 Conference Paper

Real-to-Sim Deformable Object Manipulation: Optimizing Physics Models with Residual Mappings for Robotic Surgery

Xiao Liang
Fei Liu 0033
Yutong Zhang
Yuelei Li
Shan Lin
Michael C. Yip

Accurate deformable object manipulation (DOM) is essential for achieving autonomy in robotic surgery, where soft tissues are being displaced, stretched, and dissected. Many DOM methods can be powered by simulation, which ensures realistic deformation by adhering to the governing physical constraints and allowing for model prediction and control. However, real soft objects in robotic surgery, such as membranes and soft tissues, have complex, anisotropic physical parameters that a simulation with simple initialization from cameras may not fully capture. To use the simulation techniques in real surgical tasks, the real-to-sim gap needs to be properly compensated. In this work, we propose an online, adaptive parameter tuning approach for simulation optimization that (1) bridges the real-to-sim gap between a physics simulation and observations obtained 3D perceptions through estimating a residual mapping and (2) optimizes its stiffness parameters online. Our method ensures a small residual gap between the simulation and observation and improves the simulation’s predictive capabilities. The effectiveness of the proposed mechanism is evaluated in the manipulation of both a thin-shell and volumetric tissue, representative of most tissue scenarios. This work contributes to the advancement of simulation-based deformable tissue manipulation and holds potential for improving surgical autonomy.

Details

NeurIPS Conference 2024 Conference Paper

Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation

Yaoyuan Liang
Zhuojun Cai
Jian Xu
Guanbo Huang
Yiran Wang
Xiao Liang
Jiahao Liu
Ziran Li

The Multi-modal Large Language Model (MLLM) based Referring Expression Generation (REG) task has gained increasing popularity, which aims to generate an unambiguous text description that applies to exactly one object or region in the image by leveraging foundation models. We empirically found that there exists a potential trade-off between the detailedness and the correctness of the descriptions for the referring objects. On the one hand, generating sentences with more details is usually required in order to provide more precise object descriptions. On the other hand, complicated sentences could easily increase the probability of hallucinations. To address this issue, we propose a training-free framework, named ``unleash-then-eliminate'', which first elicits the latent information in the intermediate layers, and then adopts a cycle-consistency-based decoding method to alleviate the production of hallucinations. Furthermore, to reduce the computational load of cycle-consistency-based decoding, we devise a Probing-based Importance Estimation method to statistically estimate the importance weights of intermediate layers within a subset. These importance weights are then incorporated into the decoding process over the entire dataset, intervening in the next token prediction from intermediate layers. Extensive experiments conducted on the RefCOCOg and PHD benchmarks show that our proposed framework could outperform existing methods on both semantic and hallucination-related metrics. Code will be made available in https: //github. com/Glupayy/unleash-eliminate.

PDF Details DOI

IROS Conference 2023 Conference Paper

Visual Localization Based on Multiple Maps

Yukai Lin
Liu Liu
Xiao Liang
Jiangwei Li

This paper proposes a multi-map based visual localization method for image sequences. Given multiple single-map based localization results, we combine them with SLAM to estimate robust and accurate camera poses under challenging conditions. Our method comprises three modules connected in a sequence. First, we reconstruct multiple reference maps using the Structure-from-Motion technique, one map for each reference sequence. A single-image-based localization pipeline is performed to estimate 6-DoF camera poses for each query image, one for each map. Second, a consensus set maximization module is proposed to select the best camera poses from multi-map poses, estimating one 6-DoF camera pose for each query image. Finally, a robust pose refinement module is proposed to optimize 6-DoF camera poses of query images, combining map-based localization and local SLAM information. Experiments show that the proposed pipeline achieves state-of-the-art performance on challenging map-based localization benchmarks. Demonstrating the broad applicability of our method, we obtained first place in the challenge of Map-Based Localization for Autonomous Driving at ECCV2022.

Details

JBHI Journal 2022 Journal Article

A Refined Blood Pressure Estimation Model Based on Single Channel Photoplethysmography

Yiming Zhang
Xianglin Ren
Xiao Liang
Xuesong Ye
Congcong Zhou

This study proposed a refined BP prediction strategy that using single-channel photoplethysmography (PPG) signals to stratify populations by cardiovascular status before BP estimation. Combining demographic characteristics (age, gender) and pulse wave morphological features, the random forest was applied to screen two kinds of typical cardiovascular diseases (CVDs), with an accuracy of 92. 2%. A deep learning model (BiLSTM-At) was proposed to estimate the long-term BP trend for different CVD groups. Transfer learning technique was used for personalized modeling to reduce computational complexity while improving performance. The method was validated on 255 patients with different CVDs. The mean absolute errors (MAEs) of systolic blood pressure (SBP) and diastolic blood pressure (DBP) estimation were 2. 815 mmHg and 1. 876 mmHg for normal subjects, 3. 024 mmHg and 1. 334 mmHg for AF subjects, and 4. 444 mmHg and 2. 549 mmHg for CA subjects. The results met the American Association for the Advancement of Medical Instrumentations (AAMI) and British Hypertension Society (BHS) Class A criteria. This indicated that our strategy has good performance and can realize long-term monitoring of BP through a small batch samples, with the potential to implement real-time monitoring in healthy devices.

Details DOI

ICLR Conference 2021 Conference Paper

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

Shengyu Zhao
Jonathan Cui
Yilun Sheng
Yue Dong
Xiao Liang
Eric I-Chao Chang
Yan Xu 0001

Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation. Code is available at https://github.com/zsyzzsoft/co-mod-gan.

Details