Arrow Research search

Author name cluster

Xiao Liang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

18 papers
2 author rows

Possible papers

18

AAAI Conference 2026 Conference Paper

Anatomical Region-Guided Contrastive Decoding: A Plug-and-Play Strategy for Mitigating Hallucinations in Medical VLMs

  • Xiao Liang
  • Chenxi Liu
  • Zhi Ma
  • Di Wang
  • Bin Jing
  • Quan Wang
  • Yuanyuan Shi

Medical Vision-Language Models (MedVLMs) show immense promise in clinical applicability. However, their reliability is hindered by hallucinations, where models often fail to derive answers from visual evidence, instead relying on learned textual priors. Existing mitigation strategies for MedVLMs have distinct limitations: training-based methods rely on costly expert annotations, limiting scalability, while training-free interventions like contrastive decoding, though data-efficient, apply a global, untargeted correction whose effects in complex real-world clinical settings can be unreliable. To address these challenges, we introduce Anatomical Region-Guided Contrastive Decoding (ARCD), a plug-and-play strategy that mitigates hallucinations by providing targeted, region-specific guidance. Our module leverages an anatomical mask to direct a three-tiered contrastive decoding process. By dynamically re-weighting at the token, attention, and logits levels, it verifiably steers the model's focus onto specified regions, reinforcing anatomical understanding and suppressing factually incorrect outputs. Extensive experiments across diverse datasets, including chest X-ray, CT, brain MRI, and ocular ultrasound, demonstrate our method's effectiveness in improving regional understanding, reducing hallucinations, and enhancing overall diagnostic accuracy.

JBHI Journal 2026 Journal Article

MRLF-DDI: A Multi-View Representation Learning Framework for Drug-Drug Interaction Event Prediction

  • Jian Zhong
  • Haochen Zhao
  • Xiao Liang
  • Qichang Zhao
  • Jianxin Wang

Accurately predicting drug-drug interaction events (DDIEs) is critical for improving medication safety and guiding clinical decision-making. However, existing graph neural network (GNN)-based methods often struggle to effectively integrate multi-view features and generalize to novel or understudied drugs. To address these limitations, we propose MRLF-DDI, a multi-view representation learning framework that jointly models information from individual drug features, local interaction contexts, and global interaction patterns. MRLF-DDI introduces the use of atom-level structural features enriched with bond angle information—marking the first incorporation of this geometry-aware feature in DDIE prediction. It further employs a multi-granularity GNN and a gated knowledge transfer strategy to enhance feature learning and cold-start generalization. Extensive experiments on benchmark datasets demonstrate that MRLF-DDI achieves superior performance in both warm-start and cold-start scenarios. Case studies and visualization analyses further highlight its practical utility in identifying clinically relevant DDIEs.

JBHI Journal 2026 Journal Article

Topology-aware Diffusion Schrödinger Bridge for Unpaired H&E-to-IHC Stain Translation

  • Chujie Zhang
  • Yangyang Xie
  • Jihong Hu
  • Xiaoyu Shi
  • Yinhao Li
  • Xiao Liang
  • Lanfen Lin
  • Yen-Wei Chen

Unpaired H&E-to-IHC Stain Translation aims to generate immunohistochemistry (IHC) staining from Hematoxylin and Eosin (H&E) staining. It offers clearer diagnostic insights and potentially expands access to advanced pathology services in resource-limited areas. This task faces two primary challenges: capturing target domain style characteristics and preserving topological features in histological images. Recently, Schrödinger Bridge (SB)-based methods have offered a solution for unpaired image-to-image translation, addressing the mode collapse and artifact issues in CycleGAN-based approaches, as well as the Gaussian prior assumption limitation in diffusion-based methods. While SB-based methods suffer from the curse of dimensionality with high-resolution images, the Unpaired Neural Schrödinger Bridge (UNSB) overcomes this challenge and achieves state-of-the-art (SOTA) performance on natural images. However, UNSB has two key issues in histological images: (1) loss of topological features and (2) IHC staining representation. UNSB focuses only on the optimal path from source to target domains, ignoring local structure paths. Convolutional neural networks (CNNs) do not perfectly preserve critical anatomical structures due to limitations like receptive field size or model capacity. To address these challenges, we introduce the T opology-aware D iffusion S chrödinger B ridge ( TDSB ), integrating a Topology Guidance (TG) module and Dual-Domain Adaptive Patch-based noise contrastive estimation (DDAP). Experiments on seven translation tasks across three datasets show that our method achieves SOTA performance in unpaired H&E-to-IHC stain translation. Clinical evaluation through pathologists' assessments further validates the effectiveness of our method.

NeurIPS Conference 2025 Conference Paper

AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining

  • Hongyuan Dong
  • Dingkang Yang
  • Xiao Liang
  • Ran Jiao

Learning rate is widely regarded as crucial for effective foundation model pretraining. Recent research explores and demonstrates the transferability of learning rate configurations across varying model and dataset sizes, etc. Nevertheless, these approaches are constrained to specific training scenarios and typically necessitate extensive hyperparameter tuning on proxy models. In this work, we propose \textbf{AdaLRS}, a plug-in-and-play adaptive learning rate search algorithm that conducts online optimal learning rate search via optimizing loss descent velocities. We provide theoretical and experimental analyzes to show that foundation model pretraining loss and its descent velocity are both convex and share the same optimal learning rate. Relying solely on training loss dynamics, AdaLRS involves few extra computations to guide the search process, and its convergence is guaranteed via theoretical analysis. Experiments on both LLM and VLM pretraining show that AdaLRS adjusts suboptimal learning rates to the neighborhood of optimum with marked efficiency and effectiveness, with model performance improved accordingly. We also show the robust generalizability of AdaLRS across varying training scenarios, such as different model sizes, training paradigms, base learning rate scheduler choices, and hyperparameter settings.

JBHI Journal 2025 Journal Article

AGPred: An End-to-End Deep Learning Model for Predicting Drug Approvals in Clinical Trials Based on Molecular Features

  • Haochen Zhao
  • Xiao Liang
  • Chenliang Xie
  • Shaokai Wang

One of the major challenges in drug development is maintaining acceptable levels of efficacy and safety throughout the various stages of clinical trials and successfully bringing the drug to market. However, clinical trials are time-consuming and expensive. While there are computational methods designed to predict the likelihood of a drug passing clinical trials and reaching the market, these methods heavily rely on manual feature engineering and cannot automatically learn drug molecular representations, resulting in relatively low model performance. In this study, we propose AGPred, an attention-based deep Graph Neural Network (GNN) designed to predict drug approval rates in clinical trials accurately. Unlike the few existing studies on drug approval prediction, which only use predicted targets of compounds, our novel approach employs a GNN module to extract high-potential features of compounds based on their molecular graphs. Additionally, a cross-attention-based fusion module is utilized to learn molecular fingerprint features, enhancing the model's representation of chemical structures. Meanwhile, AGPred integrates the physicochemical properties of drugs to provide a comprehensive description of the molecules. Experimental results indicate that AGPred outperforms four state-of-the-art models on both benchmark and independent datasets. The study also includes several ablation experiments and visual analyses to demonstrate the effectiveness of our method in predicting drug approval during clinical trials.

ICRA Conference 2025 Conference Paper

AutoPeel: Adhesion-Aware Safe Peeling Trajectory Optimization for Robotic Wound Care

  • Xiao Liang
  • Youcheng Zhang
  • Fei Liu 0033
  • Florian Richter 0002
  • Michael C. Yip

Chronic wounds, including diabetic ulcers, pressure ulcers, and ulcers secondary to venous hypertension, affects more than 6. 5 million patients and a yearly cost of more than $25 billion in the United States alone. Chronic wound treatment is currently a manual process, and we envision a future where robotics and automation will aid in this treatment to reduce cost and improve patient care. In this work, we present the development of the first robotic system for wound dressing removal which is reported to be the worst aspect of living with chronic wounds. Our method leverages differentiable physics-based simulation to perform gradient-based trajectory optimization for peeling trajectory planning. By integrating fracture mechanics of adhesion, we are able to model the peeling effect inherent to dressing adhesion. The system is further guided by carefully designed objective functions that promote both efficient and safe control, reducing the risk of tissue damage. We validated the efficacy of our approach through a series of experiments conducted on both synthetic skin phantoms and real human subjects. Our results demonstrate the system's ability to achieve precise and safe dressing removal trajectories, offering a promising solution for automating this essential healthcare procedure.

JBHI Journal 2025 Journal Article

Image-enhanced Multi-Modal Contrastive Transformer for subcellular spatial transcriptomics

  • Wanwan Shi
  • Ying Liu
  • Qiu Xiao
  • Yuting Bai
  • Xiao Liang
  • Xinling Zeng
  • Chee Keong Kwoh
  • Jiawei Luo

Recent advances in spatial molecular imaging technologies have enabled gene expression profiling alongside high-resolution imaging, providing unprece dented opportunities to resolve molecular heterogeneity at subcellular resolution. However, these technologies fail to fully capture cellular characteristics due to the limited number of genes they can detect, which hinderdownstream analysis. Spatial imaging data provide high-resolution and fine-grained morphology information, developing computational methods that effectively integrate image features with transcriptomic profiles is crucial for enabling comprehensive subcellular data analysis. In this study, we present SIMMT, an image-enhanced multi-modal contrastivetrans former framework for identifying spatial domains and en hancing subcellular data. In the framework, we design a dual transformer architecture to learn multi-modal representations for cells by modeling transcriptomics and morphological images respectively. To fully capture modality interactions within spatial contexts, we introduce a contrastive learning module that enhances cell representation by aligning tissue morphology and gene expression at the cell level. We tested SIMMT on subcellular spatial transcriptomics datasets from human lung cancer tissue, mouse brain tissue, human colorectal cancer tissue, and human ovarian cancer tissue. The results demonstrated that SIMMT consistently outperformed state-of-the-art methods in spatial clustering and gene expression pattern analysis. Our method also effectively demonstrated its ability to identify tumor spatial heterogeneity and uncover potential gene biomarkers in the human bronchiolar adenoma (BA) dataset. The code and dataset of SIMMT can be downloaded from https://github.com/LWanzi/SIMMT

ICLR Conference 2025 Conference Paper

Integrative Decoding: Improving Factuality via Implicit Self-consistency

  • Yi Cheng
  • Xiao Liang
  • Yeyun Gong
  • Wen Xiao
  • Song Wang
  • Yuji Zhang 0002
  • Wenjun Hou
  • Kaishuai Xu

Self-consistency-based approaches, which involve repeatedly sampling multiple outputs and selecting the most consistent one as the final response, prove to be remarkably effective in improving the factual accuracy of large language models. Nonetheless, existing methods usually have strict constraints on the task format, largely limiting their applicability. In this paper, we present Integrative Decoding (ID), to unlock the potential of self-consistency in open-ended generation tasks. ID operates by constructing a set of inputs, each prepended with a previously sampled response, and then processes them concurrently, with the next token being selected by aggregating of all their corresponding predictions at each decoding step. In essence, this simple approach implicitly incorporates self-consistency in the decoding objective. Extensive evaluation shows that ID consistently enhances factuality over a wide range of language models, with substantial improvements on the TruthfulQA (+11.2%), Biographies (+15.4%) and LongFact (+8.5%) benchmarks. The performance gains amplify progressively as the number of sampled responses increases, indicating the potential of ID to scale up with repeated sampling.

ICRA Conference 2025 Conference Paper

MEDiC: Autonomous Surgical Robotic Assistance to Maximizing Exposure for Dissection and Cautery

  • Xiao Liang
  • Chung-Pang Wang
  • Nikhil U. Shinde
  • Fei Liu 0033
  • Florian Richter 0002
  • Michael C. Yip

Surgical automation has the capability to improve the consistency of patient outcomes and broaden access to advanced surgical care in underprivileged communities. Shared autonomy, where the robot automates routine subtasks while the surgeon retains partial teleoperative control, offers great potential to make an impact. In this paper we focus on one important skill within surgical shared autonomy: Automating robotic assistance to maximize visual exposure and apply tissue tension for dissection and cautery. Ensuring consistent exposure to visualize the surgical site is crucial for both efficiency and patient safety. However, achieving this is highly challenging due to the complexities of manipulating deformable volumetric tissues that are prevalent in surgery. To address these challenges we propose MEDiC, a framework for autonomous surgical robotic assistance to Maximizing Exposure for Dissection and Cautery. We integrate a differentiable physics model with perceptual feedback to achieve our two key objectives: 1) Maximizing tissue exposure and applying tension for a specified dissection site through visual-servoing control and 2) Selecting optimal control positions for a dissection target based on deformable Jacobian analysis. We quantitatively assess our method through repeated real robot experiments on a tissue phantom. Our visual-servoing and optimal control position selection achieve success rate of 100% and 82% respectively in ablation study. We also showcase our framework's capabilities through dissection experiments using shared autonomy on real animal tissue.

NeurIPS Conference 2025 Conference Paper

SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning

  • Xiao Liang
  • Zhong-Zhi Li
  • Yeyun Gong
  • Yang Wang
  • Hengyuan Zhang
  • Yelong Shen
  • Ying Nian Wu
  • Weizhu Chen

Reinforcement Learning with Verifiable Rewards (RLVR) has proven effective for training large language models (LLMs) on complex reasoning tasks, such as mathematical problem solving. A prerequisite for the scalability of RLVR is a high-quality problem set with precise and verifiable answers. However, the scarcity of well-crafted human-labeled math problems and limited-verification answers in existing distillation-oriented synthetic datasets limit their effectiveness in RL. Additionally, most problem synthesis strategies indiscriminately expand the problem set without considering the model’s capabilities, leading to low efficiency in generating useful questions. To mitigate this issue, we introduce a Self-aware Weakness-driven problem Synthesis framework (SwS) that systematically identifies model deficiencies and leverages them for problem augmentation. Specifically, we define weaknesses as questions that the model consistently fails to learn through its iterative sampling during RL training. We then extract the core concepts from these failure cases and synthesize new problems to strengthen the model's weak areas in subsequent augmented training, enabling it to focus on and gradually overcome its weaknesses. Without relying on external knowledge distillation, our framework enables robust generalization by empowering the model to self-identify and address its weaknesses in RL, yielding average performance gains of 10% and 7. 7% on 7B and 32B models across eight mainstream reasoning benchmarks. Our code and data are available at https: //anonymous. 4open. science/r/SwS-E6F5/

ICRA Conference 2024 Conference Paper

Achieving Autonomous Cloth Manipulation with Optimal Control via Differentiable Physics-Aware Regularization and Safety Constraints

  • Yutong Zhang
  • Fei Liu 0033
  • Xiao Liang
  • Michael C. Yip

Cloth manipulation is a category of deformable object manipulation of great interest to the robotics community, from applications of automated laundry-folding and home organizing to textiles and flexible manufacturing. Despite the desire for automated cloth manipulation, the thin-shell dynamics and under-actuation nature of cloth present significant challenges for robots to effectively interact with them. Many recent works omit explicit modeling in favor of learning-based methods that may yield control policies directly. However, these methods require large training sets that must be collected and curated. In this regard, we create a framework for differentiable modeling of cloth dynamics leveraging an Extended Position-based Dynamics (XPBD) algorithm. Together with the desired control objective, physics-aware regularization terms are designed for better results, including trajectory smoothness and elastic potential energy. In addition, safety constraints, such as avoiding obstacles, can be specified using signed distance functions (SDFs). We formulate the cloth manipulation task with safety constraints as a constrained optimization problem, which can be effectively solved by mainstream gradient-based optimizers thanks to the end-to-end differentiability of our framework. Finally, we assess the framework with various safety thresholds and demonstrate the feasibility of result trajectories on a surgical robot. The effects of the regularization terms are analyzed in an additional ablation study.

AAAI Conference 2024 Conference Paper

CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding

  • Yaoyuan Liang
  • Xiao Liang
  • Yansong Tang
  • Zhao Yang
  • Ziran Li
  • Jingang Wang
  • Wenbo Ding
  • Shao-Lun Huang

This paper studies the spatio-temporal video grounding task, which aims to localize a spatio-temporal tube in an untrimmed video based on the given text description of an event. Existing one-stage approaches suffer from insufficient space-time interaction in two aspects: i) less precise prediction of event temporal boundaries, and ii) inconsistency in object prediction for the same event across adjacent frames. To address these issues, we propose a framework of Comprehensive Space-Time entAnglement (CoSTA) to densely entangle space-time multi-modal features for spatio-temporal localization. Specifically, we propose a space-time collaborative encoder to extract comprehensive video features and leverage Transformer to perform spatio-temporal multi-modal understanding. Our entangled decoder couples temporal boundary prediction and spatial localization via an entangled query, boasting an enhanced ability to capture object-event relationships. We conduct extensive experiments on the challenging benchmarks of HC-STVG and VidSTG, where CoSTA outperforms existing state-of-the-art methods, demonstrating its effectiveness for this task.

ICRA Conference 2024 Conference Paper

Real-to-Sim Deformable Object Manipulation: Optimizing Physics Models with Residual Mappings for Robotic Surgery

  • Xiao Liang
  • Fei Liu 0033
  • Yutong Zhang
  • Yuelei Li
  • Shan Lin
  • Michael C. Yip

Accurate deformable object manipulation (DOM) is essential for achieving autonomy in robotic surgery, where soft tissues are being displaced, stretched, and dissected. Many DOM methods can be powered by simulation, which ensures realistic deformation by adhering to the governing physical constraints and allowing for model prediction and control. However, real soft objects in robotic surgery, such as membranes and soft tissues, have complex, anisotropic physical parameters that a simulation with simple initialization from cameras may not fully capture. To use the simulation techniques in real surgical tasks, the real-to-sim gap needs to be properly compensated. In this work, we propose an online, adaptive parameter tuning approach for simulation optimization that (1) bridges the real-to-sim gap between a physics simulation and observations obtained 3D perceptions through estimating a residual mapping and (2) optimizes its stiffness parameters online. Our method ensures a small residual gap between the simulation and observation and improves the simulation’s predictive capabilities. The effectiveness of the proposed mechanism is evaluated in the manipulation of both a thin-shell and volumetric tissue, representative of most tissue scenarios. This work contributes to the advancement of simulation-based deformable tissue manipulation and holds potential for improving surgical autonomy.

NeurIPS Conference 2024 Conference Paper

Unleashing Region Understanding in Intermediate Layers for MLLM-based Referring Expression Generation

  • Yaoyuan Liang
  • Zhuojun Cai
  • Jian Xu
  • Guanbo Huang
  • Yiran Wang
  • Xiao Liang
  • Jiahao Liu
  • Ziran Li

The Multi-modal Large Language Model (MLLM) based Referring Expression Generation (REG) task has gained increasing popularity, which aims to generate an unambiguous text description that applies to exactly one object or region in the image by leveraging foundation models. We empirically found that there exists a potential trade-off between the detailedness and the correctness of the descriptions for the referring objects. On the one hand, generating sentences with more details is usually required in order to provide more precise object descriptions. On the other hand, complicated sentences could easily increase the probability of hallucinations. To address this issue, we propose a training-free framework, named ``unleash-then-eliminate'', which first elicits the latent information in the intermediate layers, and then adopts a cycle-consistency-based decoding method to alleviate the production of hallucinations. Furthermore, to reduce the computational load of cycle-consistency-based decoding, we devise a Probing-based Importance Estimation method to statistically estimate the importance weights of intermediate layers within a subset. These importance weights are then incorporated into the decoding process over the entire dataset, intervening in the next token prediction from intermediate layers. Extensive experiments conducted on the RefCOCOg and PHD benchmarks show that our proposed framework could outperform existing methods on both semantic and hallucination-related metrics. Code will be made available in https: //github. com/Glupayy/unleash-eliminate.

IROS Conference 2023 Conference Paper

Visual Localization Based on Multiple Maps

  • Yukai Lin
  • Liu Liu
  • Xiao Liang
  • Jiangwei Li

This paper proposes a multi-map based visual localization method for image sequences. Given multiple single-map based localization results, we combine them with SLAM to estimate robust and accurate camera poses under challenging conditions. Our method comprises three modules connected in a sequence. First, we reconstruct multiple reference maps using the Structure-from-Motion technique, one map for each reference sequence. A single-image-based localization pipeline is performed to estimate 6-DoF camera poses for each query image, one for each map. Second, a consensus set maximization module is proposed to select the best camera poses from multi-map poses, estimating one 6-DoF camera pose for each query image. Finally, a robust pose refinement module is proposed to optimize 6-DoF camera poses of query images, combining map-based localization and local SLAM information. Experiments show that the proposed pipeline achieves state-of-the-art performance on challenging map-based localization benchmarks. Demonstrating the broad applicability of our method, we obtained first place in the challenge of Map-Based Localization for Autonomous Driving at ECCV2022.

JBHI Journal 2022 Journal Article

A Refined Blood Pressure Estimation Model Based on Single Channel Photoplethysmography

  • Yiming Zhang
  • Xianglin Ren
  • Xiao Liang
  • Xuesong Ye
  • Congcong Zhou

This study proposed a refined BP prediction strategy that using single-channel photoplethysmography (PPG) signals to stratify populations by cardiovascular status before BP estimation. Combining demographic characteristics (age, gender) and pulse wave morphological features, the random forest was applied to screen two kinds of typical cardiovascular diseases (CVDs), with an accuracy of 92. 2%. A deep learning model (BiLSTM-At) was proposed to estimate the long-term BP trend for different CVD groups. Transfer learning technique was used for personalized modeling to reduce computational complexity while improving performance. The method was validated on 255 patients with different CVDs. The mean absolute errors (MAEs) of systolic blood pressure (SBP) and diastolic blood pressure (DBP) estimation were 2. 815 mmHg and 1. 876 mmHg for normal subjects, 3. 024 mmHg and 1. 334 mmHg for AF subjects, and 4. 444 mmHg and 2. 549 mmHg for CA subjects. The results met the American Association for the Advancement of Medical Instrumentations (AAMI) and British Hypertension Society (BHS) Class A criteria. This indicated that our strategy has good performance and can realize long-term monitoring of BP through a small batch samples, with the potential to implement real-time monitoring in healthy devices.

ICLR Conference 2021 Conference Paper

Large Scale Image Completion via Co-Modulated Generative Adversarial Networks

  • Shengyu Zhao
  • Jonathan Cui
  • Yilun Sheng
  • Yue Dong
  • Xiao Liang
  • Eric I-Chao Chang
  • Yan Xu 0001

Numerous task-specific variants of conditional generative adversarial networks have been developed for image completion. Yet, a serious limitation remains that all existing algorithms tend to fail when handling large-scale missing regions. To overcome this challenge, we propose a generic new approach that bridges the gap between image-conditional and recent modulated unconditional generative architectures via co-modulation of both conditional and stochastic style representations. Also, due to the lack of good quantitative metrics for image completion, we propose the new Paired/Unpaired Inception Discriminative Score (P-IDS/U-IDS), which robustly measures the perceptual fidelity of inpainted images compared to real images via linear separability in a feature space. Experiments demonstrate superior performance in terms of both quality and diversity over state-of-the-art methods in free-form image completion and easy generalization to image-to-image translation. Code is available at https://github.com/zsyzzsoft/co-mod-gan.