Arrow Research search

Author name cluster

Xiaonan Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
1 author row

Possible papers

7

AAAI Conference 2026 Conference Paper

Better Datasets Start from RefineLab: Automatic Optimization for High-Quality Dataset Refinement

  • Xiaonan Luo
  • Yue Huang
  • Ping He
  • Xiangliang Zhang

High‑quality Question–Answer (QA) datasets are foundational for reliable Large Language Model (LLM) evaluation, yet even expert‑crafted datasets exhibit persistent gaps in domain coverage, misaligned difficulty distributions, and factual inconsistencies. The recent surge in generative model-powered datasets has compounded these quality challenges. In this work, we introduce RefineLab, the first LLM‑driven framework that automatically refines raw QA textual data into high-quality datasets under a controllable token‑budget constraint. RefineLab takes a set of target quality attributes as refinement objectives and performs selective edits within a predefined token budget to ensure practicality and efficiency. In essence, RefineLab addresses a constrained optimization problem: improving the quality of QA samples as much as possible while respecting resource limitations. With a set of available refinement operations, RefineLab takes as input the original dataset, a specified set of target quality dimensions, and a token budget, and determines which refinement operations should be applied to each QA sample. This process is guided by an assignment module that selects optimal refinement strategies to maximize overall dataset quality while adhering to the budget constraint. Experiments demonstrate that RefineLab consistently narrows divergence from expert datasets across coverage, difficulty alignment, factual fidelity, and distractor quality. RefineLab pioneers a scalable, customizable path to reproducible dataset design, with broad implications for LLM evaluation.

NeurIPS Conference 2025 Conference Paper

AdaReasoner: Adaptive Reasoning Enables More Flexible Thinking

  • Xiangqi Wang
  • Yue Huang
  • Yanbo Wang
  • Xiaonan Luo
  • Kehan Guo
  • Yujun Zhou
  • Xiangliang Zhang

LLMs often need effective configurations, like temperature and reasoning steps, to handle tasks requiring sophisticated reasoning and problem-solving, ranging from joke generation to mathematical reasoning. Existing prompting approaches usually adopt general-purpose, fixed configurations that work “well enough” across tasks but seldom achieve task-specific optimality. To address this gap, we introduce AdaReasoner, an LLM-agnostic plugin designed for any LLM to automate adaptive reasoning configurations for tasks requiring different types of thinking. AdaReasoner is trained using a reinforcement learning (RL) framework, combining a factorized action space with a targeted exploration strategy, along with a pretrained reward model to optimize the policy model for reasoning configurations with only a few-shot guide. AdaReasoner is backed by theoretical guarantees and experiments of fast convergence and a sublinear policy gap. Across six different LLMs and a variety of reasoning tasks, it consistently outperforms standard baselines, preserves out-of-distribution robustness, and yield gains on knowledge-intensive tasks through tailored prompts.

NeurIPS Conference 2025 Conference Paper

ChemOrch: Empowering LLMs with Chemical Intelligence via Groundbreaking Synthetic Instructions

  • Yue Huang
  • Zhengzhe Jiang
  • Xiaonan Luo
  • Kehan Guo
  • Haomin Zhuang
  • Yujun Zhou
  • Zhengqing Yuan
  • Xiaoqi Sun

Empowering large language models (LLMs) with chemical intelligence remains a challenge due to the scarcity of high-quality, domain-specific instruction-response datasets and the misalignment of existing synthetic data generation pipelines with the inherently hierarchical and rule-governed structure of chemical information. To address this, we propose ChemOrch, a framework that synthesizes chemically grounded instruction–response pairs through a two-stage process: task-controlled instruction generation and tool-aware response construction. ChemOrch enables controllable diversity and levels of difficulty for the generated tasks and ensures response precision through tool planning & distillation, and tool-based self-repair mechanisms. The effectiveness of ChemOrch is evaluated based on: 1) the \textbf{high quality} of generated instruction data, demonstrating superior diversity and strong alignment with chemical constraints; 2) the \textbf{dynamic generation of evaluation tasks} that more effectively reveal LLM weaknesses in chemistry; and 3) the significant \textbf{improvement of LLM chemistry capabilities} when the generated instruction data are used for fine-tuning. Our work thus represents a critical step toward scalable and verifiable chemical intelligence in LLMs. The code is available at \url{https: //anonymous. 4open. science/r/ChemOrch-854A}.

JBHI Journal 2025 Journal Article

Semi-Supervised Gland Segmentation via Feature-Enhanced Contrastive Learning and Dual-Consistency Strategy

  • Jiejiang Yu
  • Bingbing Li
  • Xipeng Pan
  • Zhenwei Shi
  • Huadeng Wang
  • Rushi Lan
  • Xiaonan Luo

In the field of gland segmentation in histopathology, deep-learning methods have made significant progress. However, most existing methods not only require a large amount of high-quality annotated data but also tend to confuse the internal of the gland with the background. To address this challenge, we propose a new semi-supervised method named DCCL-Seg for gland segmentation, which follows the teacher-student framework. Our approach can be divided into follows steps. First, we design a contrastive learning module to improve the ability of the student model's feature extractor to distinguish between gland and background features. Then, we introduce a Signed Distance Field (SDF) prediction task and employ dual-consistency strategy (across tasks and models) to better reinforce the learning of gland internal. Next, we proposed a pseudo label filtering and reweighting mechanism, which filters and reweights the pseudo labels generated by the teacher model based on confidence. However, even after reweighting, the pseudo labels may still be influenced by unreliable pixels. Finally, we further designed an assistant predictor to learn the reweighted pseudo labels, which do not interfere with the student model's predictor and ensure the reliability of the student model's predictions. Experimental results on the publicly available GlaS and CRAG datasets demonstrate that our method outperforms other semi-supervised medical image segmentation methods.

AIIM Journal 2025 Journal Article

Weakly supervised nuclei segmentation based on pseudo label correction and uncertainty denoising

  • Xipeng Pan
  • Shilong Song
  • Zhenbing Liu
  • Huadeng Wang
  • Lingqiao Li
  • Haoxiang Lu
  • Rushi Lan
  • Xiaonan Luo

Nuclei segmentation plays a vital role in computer-aided histopathology image analysis. Numerous fully supervised learning approaches exhibit amazing performance relying on pathological image with precisely annotations. Whereas, it is difficult and time-consuming in accurate manual labeling on pathological images. Hence, this paper presents a two-stage weakly supervised model including coarse and fine phases, which can achieve nuclei segmentation on whole slide images using only point annotations. In the coarse segmentation step, Voronoi diagram and K-means cluster results are generated based on the point annotations to supervise the training network. In order to cope with the different imaging conditions, an image adaptive clustering pseudo label algorithm is proposed to adapt the color distribution of different images. A Multi-scale Feature Fusion (MFF) module is designed in the decoder to better fusion the feature outputs. Additionally, to reduce the interference of erroneous cluster label, an Exponential Moving Average for cluster label Correction (EMAC) strategy is proposed. After the first step, an uncertainty estimation pseudo label denoising strategy is introduced to denoise Voronoi diagram and adaptive cluster label. In the fine segmentation step, the optimized labels are used for training to obtain the final predicted probability map. Extensive experiments are performed on MoNuSeg and TNBC public benchmarks, which demonstrate our proposed method is superior to other existing nuclei segmentation methods based on point labels. Codes are available at: https: //github. com/SSL-droid/WNS-PLCUD.

IJCAI Conference 2018 Conference Paper

Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

  • Tianshui Chen
  • Liang Lin
  • Riquan Chen
  • Yang Wu
  • Xiaonan Luo

Humans can naturally understand an image in depth with the aid of rich knowledge accumulated from daily lives or professions. For example, to achieve fine-grained image recognition (e. g. , categorizing hundreds of subordinate categories of birds) usually requires a comprehensive visual concept organization including category labels and part-level attributes. In this work, we investigate how to unify rich professional knowledge with deep neural network architectures and propose a Knowledge-Embedded Representation Learning (KERL) framework for handling the problem of fine-grained image recognition. Specifically, we organize the rich visual concepts in the form of knowledge graph and employ a Gated Graph Neural Network to propagate node message through the graph for generating the knowledge representation. By introducing a novel gated mechanism, our KERL framework incorporates this knowledge representation into the discriminative image feature learning, i. e. , implicitly associating the specific attributes with the feature maps. Compared with existing methods of fine-grained image classification, our KERL framework has several appealing properties: i) The embedded high-level knowledge enhances the feature representation, thus facilitating distinguishing the subtle differences among subordinate categories. ii) Our framework can learn feature maps with a meaningful configuration that the highlighted regions finely accord with the nodes (specific attributes) of the knowledge graph. Extensive experiments on the widely used Caltech-UCSD bird dataset demonstrate the superiority of our KERL framework over existing state-of-the-art methods.

AAAI Conference 2018 Conference Paper

Learning a Wavelet-Like Auto-Encoder to Accelerate Deep Neural Networks

  • Tianshui Chen
  • Liang Lin
  • Wangmeng Zuo
  • Xiaonan Luo
  • Lei Zhang

Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can benefit a wide range of applications, e. g. , enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a two-stage process: operating on the trained DNNs (e. g. , approximating the convolutional filters with tensor decomposition) and finetuning the amended network, leading to difficulty in balancing the trade-off between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classification neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the low-frequency information (e. g. , image profiles) and high-frequency (e. g. , image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the low-frequency channel into a standard classification network such as VGG or ResNet and employ a very lightweight network to fuse with the high-frequency channel to obtain the classification result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classification without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classification.