Author name cluster

Xiao Cui

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers

1 author row

AAAI Conference 2026 Conference Paper

Rethinking Long-tailed Dataset Distillation: A Uni-Level Framework with Unbiased Recovery and Relabeling

Xiao Cui
Yulei Qin
Xinyue Li
Wengang Zhou
Hongsheng Li
Houqiang Li

Dataset distillation creates a small distilled set that enables efficient training by capturing key information from the full dataset. While existing dataset distillation methods perform well on balanced datasets, they struggle under long-tailed distributions, where imbalanced class frequencies induce biased model representations and corrupt statistical estimates such as Batch Normalization (BN) statistics. In this paper, we rethink long-tailed dataset distillation by revisiting the limitations of trajectory-based methods, and instead adopt the statistical alignment perspective to jointly mitigate model bias and restore fair supervision. To this end, we introduce three dedicated components that enable unbiased recovery of distilled images and soft relabeling: (1) enhancing expert models (an observer model for recovery and a teacher model for relabeling) to enable reliable statistics estimation and soft-label generation; (2) recalibrating BN statistics via a full forward pass with dynamically adjusted momentum to reduce representation skew; (3) initializing synthetic images by incrementally selecting high-confidence and diverse augmentations via a multi-round mechanism that promotes coverage and diversity. Extensive experiments on four long-tailed benchmarks show consistent improvements over state-of-the-art methods across varying degrees of class imbalance.Notably, our approach improves top-1 accuracy by 15.6% on CIFAR-100-LT and 11.8% on Tiny-ImageNet-LT under IPC=10 and IF=10.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Yulei Qin
Gang Li
Zongyi Li
Zihan Xu
Yuchen Shi
Zhekai Lin
Xiao Cui
Ke Li

Existing large language models (LLMs) face challenges of following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions. To this end, we propose RAIF, a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data acquisition method. Second, we exploit reinforcement learning (RL) with verifiable rule-centric reward signals to cultivate reasoning specifically for instruction following. We address the shallow, non-essential nature of reasoning under complex instructions via sample-wise contrast for superior CoT enforcement. We also exploit behavior cloning of experts to facilitate steady distribution shift from fast-thinking LLMs to skillful reasoners. Extensive evaluations on seven comprehensive benchmarks confirm the validity of the proposed method, where a 1. 5B LLM achieves 11. 74% gains with performance comparable to a 8B LLM. Evaluation on OOD constraints also confirms the generalizability of our RAIF.

PDF Details

AAAI Conference 2025 Conference Paper

Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models

Xiao Cui
Mo Zhu
Yulei Qin
Liang Xie
Wengang Zhou
Houqiang Li

Knowledge distillation (KD) has become a prevalent technique for compressing large language models (LLMs). Existing KD methods are constrained by the need for identical tokenizers (i.e., vocabularies) between teacher and student models, limiting their versatility in handling LLMs of different architecture families. In this paper, we introduce the Multi-Level Optimal Transport (MultiLevelOT), a novel approach that advances the optimal transport for universal cross-tokenizer knowledge distillation. Our method aligns the logit distributions of the teacher and the student at both token and sequence levels using diverse cost matrices, eliminating the need for dimensional or token-by-token correspondence. At the token level, MultiLevelOT integrates both global and local information by jointly optimizing all tokens within a sequence to enhance robustness. At the sequence level, we efficiently capture complex distribution structures of logits via the Sinkhorn distance, which approximates the Wasserstein distance for divergence measures. Extensive experiments on tasks such as extractive QA, generative QA, and summarization demonstrate that the MultiLevelOT outperforms state-of-the-art cross-tokenizer KD methods under various settings. Our approach is robust to different student and teacher models across model families, architectures, and parameter sizes.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Optimizing Distributional Geometry Alignment with Optimal Transport for Generative Dataset Distillation

Xiao Cui
Yulei Qin
Wengang Zhou
Hongsheng Li
Houqiang Li

Dataset distillation seeks to synthesize a compact distilled dataset, enabling models trained on it to achieve performance comparable to models trained on the full dataset. Recent methods for large-scale datasets focus on matching global distributional statistics (e. g. , mean and variance), but overlook critical instance-level characteristics and intraclass variations, leading to suboptimal generalization. We address this limitation by reformulating dataset distillation as an Optimal Transport (OT) distance minimization problem, enabling fine-grained alignment at both global and instance levels throughout the pipeline. OT offers a geometrically faithful framework for distribution matching. It effectively preserves local modes, intra-class patterns, and fine-grained variations that characterize the geometry of complex, high-dimensional distributions. Our method comprises three components tailored for preserving distributional geometry: (1) OT-guided diffusion sampling, which aligns latent distributions of real and distilled images; (2) label-image-aligned soft relabeling, which adapts label distributions based on the complexity of distilled image distributions; and (3) OT-based logit matching, which aligns the output of student models with soft-label distributions. Extensive experiments across diverse architectures and large-scale datasets demonstrate that our method consistently outperforms state-of-the-art approaches in an efficient manner, achieving at least 4\% accuracy improvement under IPC=10 settings for each architecture on ImageNet-1K.

PDF Details

EAAI Journal 2024 Journal Article

A lightweight deep learning based bowel sounds segmentation algorithm for gastrointestinal (GI) monitoring

Mingyuan Zhang
Xiao Cui
Liuwei Zhao
Xinlei He
Yu Shi
Jianhong Yang
YuXin Leng

Segmentation of bowel sounds (BS) events is a significant task of automatic BS monitoring. Recently, deep learning (DL) has been utilized to realize the segmentation of BS events. However, most researchers treat BS segmentation as a traditional classification problem, which causes that more precise locations for the occurrence of BS events are unable to obtained. Besides, the performance of segmentation of BS events is easily affected by thresholds of model output in practical applications. To tackle these issues, in this paper, a lightweight DL-based BS segmentation algorithm is proposed. The one-dimensional convolution layers and bidirectional gate recurrent unit (GRU) layers are adopted to enhance the ability of feature extraction. A loss function named shape loss is proposed to reduce the sensitivity of model to thresholds. Moreover, a portable BS monitoring device is developed to realize data acquisition, display and algorithm deployment. Based on this device, experiments on human and rats are conducted to verify the effectiveness of our proposed approach. Experimental results shows that the proposed method outperforms other comparison methods with the higher f1-scores and lower sensitivity to thresholds.

Details DOI

JBHI Journal 2023 Journal Article

A Deep Learning Model for Automatic Segmentation of Intraparenchymal and Intraventricular Hemorrhage for Catheter Puncture Path Planning

Guoyu Tong
Xi Wang
Huiyan Jiang
Anhua Wu
Wen Cheng
Xiao Cui
Long Bao
Ruikai Cai

Intracerebral hemorrhage is the subtype of stroke with the highest mortality rate, especially when it also causes secondary intraventricular hemorrhage. The optimal surgical option for intracerebral hemorrhage remains one of the most controversial areas of neurosurgery. We aim to develop a deep learning model for the automatic segmentation of intraparenchymal and intraventricular hemorrhage for clinical catheter puncture path planning. First, we develop a 3D U-Net embedded with a multi-scale boundary aware module and a consistency loss for segmenting two types of hematoma in computed tomography images. The multi-scale boundary aware module can improve the model's ability to understand the two types of hematoma boundaries. The consistency loss can reduce the probability of classifying a pixel into two categories at the same time. Since different hematoma volumes and locations have different treatments. We also measure hematoma volume, estimate centroid deviation, and compare with clinical methods. Finally, we plan the puncture path and conduct clinical validation. We collected a total of 351 cases, and the test set contained 103 cases. For intraparenchymal hematomas, the accuracy can reach 96 $ \% $ when the proposed method is applied for path planning. For intraventricular hematomas, the proposed model's segmentation efficiency and centroid prediction are superior to other comparable models. Experimental results and clinical practice show that the proposed model has potential for clinical application. In addition, our proposed method has no complicated modules and improves efficiency, with generalization ability.

Details DOI