Author name cluster

Xiaonan Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

1 author row

AAAI Conference 2026 Conference Paper

Better Datasets Start from RefineLab: Automatic Optimization for High-Quality Dataset Refinement

Xiaonan Luo
Yue Huang
Ping He
Xiangliang Zhang

High‑quality Question–Answer (QA) datasets are foundational for reliable Large Language Model (LLM) evaluation, yet even expert‑crafted datasets exhibit persistent gaps in domain coverage, misaligned difficulty distributions, and factual inconsistencies. The recent surge in generative model-powered datasets has compounded these quality challenges. In this work, we introduce RefineLab, the first LLM‑driven framework that automatically refines raw QA textual data into high-quality datasets under a controllable token‑budget constraint. RefineLab takes a set of target quality attributes as refinement objectives and performs selective edits within a predefined token budget to ensure practicality and efficiency. In essence, RefineLab addresses a constrained optimization problem: improving the quality of QA samples as much as possible while respecting resource limitations. With a set of available refinement operations, RefineLab takes as input the original dataset, a specified set of target quality dimensions, and a token budget, and determines which refinement operations should be applied to each QA sample. This process is guided by an assignment module that selects optimal refinement strategies to maximize overall dataset quality while adhering to the budget constraint. Experiments demonstrate that RefineLab consistently narrows divergence from expert datasets across coverage, difficulty alignment, factual fidelity, and distractor quality. RefineLab pioneers a scalable, customizable path to reproducible dataset design, with broad implications for LLM evaluation.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

Xiangqi Wang
Yue Huang
Yanbo Wang
Xiaonan Luo
Kehan Guo
Yujun Zhou
Xiangliang Zhang

LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work “well enough” across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

PDF Details

NeurIPS Conference 2025 Conference Paper

ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions

Yue Huang
Zhengzhe Jiang
Xiaonan Luo
Kehan Guo
Haomin Zhuang
Yujun Zhou
Zhengqing Yuan
Xiaoqi Sun

Empowering large language models (LLMs) with chemical intelligence remains a challenge due to the scarcity of high-quality, domain-specific instruction-response datasets and the misalignment of existing synthetic data generation pipelines with the inherently hierarchical and rule-governed structure of chemical information. To address this, we propose ChemOrch, a framework that synthesizes chemically grounded instruction–response pairs through a two-stage process: task-controlled instruction generation and tool-aware response construction. ChemOrch enables controllable diversity and levels of difficulty for the generated tasks and ensures response precision through tool planning & distillation, and tool-based self-repair mechanisms. The effectiveness of ChemOrch is evaluated based on: 1) the \textbf{high quality} of generated instruction data, demonstrating superior diversity and strong alignment with chemical constraints; 2) the \textbf{dynamic generation of evaluation tasks} that more effectively reveal LLM weaknesses in chemistry; and 3) the significant \textbf{improvement of LLM chemistry capabilities} when the generated instruction data are used for fine-tuning. Our work thus represents a critical step toward scalable and verifiable chemical intelligence in LLMs. The code is available at \url{https: //anonymous. 4open. science/r/ChemOrch-854A}.

PDF Details

JBHI Journal 2025 Journal Article

Semi-Supervised Gland Segmentation via Feature-Enhanced Contrastive Learning and Dual-Consistency Strategy

Jiejiang Yu
Bingbing Li
Xipeng Pan
Zhenwei Shi
Huadeng Wang
Rushi Lan
Xiaonan Luo

In the field of gland segmentation in histopathology, deep-learning methods have made significant progress. However, most existing methods not only require a large amount of high-quality annotated data but also tend to confuse the internal of the gland with the background. To address this challenge, we propose a new semi-supervised method named DCCL-Seg for gland segmentation, which follows the teacher-student framework. Our approach can be divided into follows steps. First, we design a contrastive learning module to improve the ability of the student model's feature extractor to distinguish between gland and background features. Then, we introduce a Signed Distance Field (SDF) prediction task and employ dual-consistency strategy (across tasks and models) to better reinforce the learning of gland internal. Next, we proposed a pseudo label filtering and reweighting mechanism, which filters and reweights the pseudo labels generated by the teacher model based on confidence. However, even after reweighting, the pseudo labels may still be influenced by unreliable pixels. Finally, we further designed an assistant predictor to learn the reweighted pseudo labels, which do not interfere with the student model's predictor and ensure the reliability of the student model's predictions. Experimental results on the publicly available GlaS and CRAG datasets demonstrate that our method outperforms other semi-supervised medical image segmentation methods.

Details DOI

AIIM Journal 2025 Journal Article

Weakly supervised nuclei segmentation based on pseudo label correction and uncertainty denoising

Xipeng Pan
Shilong Song
Zhenbing Liu
Huadeng Wang
Lingqiao Li
Haoxiang Lu
Rushi Lan
Xiaonan Luo

Nuclei segmentation plays a vital role in computer-aided histopathology image analysis. Numerous fully supervised learning approaches exhibit amazing performance relying on pathological image with precisely annotations. Whereas, it is difficult and time-consuming in accurate manual labeling on pathological images. Hence, this paper presents a two-stage weakly supervised model including coarse and fine phases, which can achieve nuclei segmentation on whole slide images using only point annotations. In the coarse segmentation step, Voronoi diagram and K-means cluster results are generated based on the point annotations to supervise the training network. In order to cope with the different imaging conditions, an image adaptive clustering pseudo label algorithm is proposed to adapt the color distribution of different images. A Multi-scale Feature Fusion (MFF) module is designed in the decoder to better fusion the feature outputs. Additionally, to reduce the interference of erroneous cluster label, an Exponential Moving Average for cluster label Correction (EMAC) strategy is proposed. After the first step, an uncertainty estimation pseudo label denoising strategy is introduced to denoise Voronoi diagram and adaptive cluster label. In the fine segmentation step, the optimized labels are used for training to obtain the final predicted probability map. Extensive experiments are performed on MoNuSeg and TNBC public benchmarks, which demonstrate our proposed method is superior to other existing nuclei segmentation methods based on point labels. Codes are available at: https: //github. com/SSL-droid/WNS-PLCUD.

Details DOI

IJCAI Conference 2018 Conference Paper

Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Tianshui Chen
Liang Lin
Riquan Chen
Yang Wu
Xiaonan Luo

Humans can naturally understand an image in depth with the aid of rich knowledge accumulated from daily lives or professions. For example, to achieve fine-grained image recognition (e. g. , categorizing hundreds of subordinate categories of birds) usually requires a comprehensive visual concept organization including category labels and part-level attributes. In this work, we investigate how to unify rich professional knowledge with deep neural network architectures and propose a Knowledge-Embedded Representation Learning (KERL) framework for handling the problem of fine-grained image recognition. Specifically, we organize the rich visual concepts in the form of knowledge graph and employ a Gated Graph Neural Network to propagate node message through the graph for generating the knowledge representation. By introducing a novel gated mechanism, our KERL framework incorporates this knowledge representation into the discriminative image feature learning, i. e. , implicitly associating the specific attributes with the feature maps. Compared with existing methods of fine-grained image classification, our KERL framework has several appealing properties: i) The embedded high-level knowledge enhances the feature representation, thus facilitating distinguishing the subtle differences among subordinate categories. ii) Our framework can learn feature maps with a meaningful configuration that the highlighted regions finely accord with the nodes (specific attributes) of the knowledge graph. Extensive experiments on the widely used Caltech-UCSD bird dataset demonstrate the superiority of our KERL framework over existing state-of-the-art methods.

PDF Details

AAAI Conference 2018 Conference Paper

Learning a Wavelet-Like Auto-Encoder to Accelerate Deep Neural Networks

Tianshui Chen
Liang Lin
Wangmeng Zuo
Xiaonan Luo
Lei Zhang

Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can beneﬁt a wide range of applications, e. g. , enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a two-stage process: operating on the trained DNNs (e. g. , approximating the convolutional ﬁlters with tensor decomposition) and ﬁnetuning the amended network, leading to difﬁculty in balancing the trade-off between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classiﬁcation neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the low-frequency information (e. g. , image proﬁles) and high-frequency (e. g. , image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the low-frequency channel into a standard classiﬁcation network such as VGG or ResNet and employ a very lightweight network to fuse with the high-frequency channel to obtain the classiﬁcation result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classiﬁcation without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classiﬁcation.

PDF Details