Arrow Research search

Author name cluster

Xiaotong Luo

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

6 papers
1 author row

Possible papers

6

AAAI Conference 2026 Conference Paper

Diffusion Once and Done: Degradation-Aware LoRA for All-in-One Image Restoration

  • Ni Tang
  • Xiaotong Luo
  • Zihan Cheng
  • Liangtai Zhou
  • Dongxiao Zhang
  • Yanyun Qu

Diffusion models have revealed powerful potential in all-in-one image restoration (AiOIR), which is talented in generating abundant texture details. The existing AiOIR methods either retrain a diffusion model or fine-tune the pretrained diffusion model with extra conditional guidance. However, they often suffer from high inference costs and limited adaptability to diverse degradation types. In this paper, we propose an efficient AiOIR method, Diffusion Once and Done (DOD), which aims to achieve superior restoration performance with only one-step sampling of Stable Diffusion (SD) models. Specifically, multi-degradation feature modulation is first introduced to capture different degradation prompts with a pretrained diffusion model. Then, parameter-efficient conditional low-rank adaptation integrates the prompts to enable the fine-tuning of the SD model for adapting to different degradation types. Besides, a high-fidelity detail enhancement module is integrated into the decoder of SD to improve structural and textural details. Experiments demonstrate that our method outperforms existing diffusion-based restoration approaches in both visual quality and inference efficiency.

AAAI Conference 2026 Conference Paper

SpikingIR: A Novel Converted Spiking Neural Network for Efficient Image Restoration

  • Yang Ouyang
  • Zihan Cheng
  • Xiaotong Luo
  • Guoqi Li
  • Yanyun Qu

Image restoration has made great progress with the rise of deep learning, but its energy consumption limits its real-world applications. Spiking Neural Networks (SNNs) are seen as energy-efficient alternatives to Artificial Neural Networks (ANNs). Applying SNNs to image restoration (IR) remains challenging, primarily due to the limited information capacity of spike-based signals. This limitation leads to quantization errors and information loss, while IR tasks are highly sensitive to output precision and error. Thus, the restoration performance suffers significantly. To address this challenge, we propose SpikingIR, an ANN-to-SNN conversion framework for IR that reduces information loss and quantization error. SpikingIR mainly consists of two components: Convolutional Pixel Mapping (CPM) and Membrane Potential Reuse Neuron (MPRN), which are designed to alleviate quantization errors and information loss in the output and intermediate layers, respectively. Specifically, CPM maps discrete outputs into a continuous space, better aligning with pixel-level details. From the perspective of information entropy, we show that outputs of CPM contain more information than the original outputs. MPRN introduces a post-processing step with relaxed firing conditions to extract residual membrane potential, reducing information waste. Furthermore, we fine-tune the converted model to jointly optimize both accuracy and energy efficiency. Experimental results demonstrate that SpikingIR achieves performance comparable to ANN counterparts across various IR benchmarks while reducing energy consumption by up to 50%.

AAAI Conference 2024 Conference Paper

AdaFormer: Efficient Transformer with Adaptive Token Sparsification for Image Super-resolution

  • Xiaotong Luo
  • Zekun Ai
  • Qiuyuan Liang
  • Ding Liu
  • Yuan Xie
  • Yanyun Qu
  • Yun Fu

Efficient transformer-based models have made remarkable progress in image super-resolution (SR). Most of these works mainly design elaborate structures to accelerate the inference of the transformer, where all feature tokens are propagated equally. However, they ignore the underlying characteristic of image content, i.e., various image regions have distinct restoration difficulties, especially for large images (2K-8K), failing to achieve adaptive inference. In this work, we propose an adaptive token sparsification transformer (AdaFormer) to speed up the model inference for image SR. Specifically, a texture-relevant sparse attention block with parallel global and local branches is introduced, aiming to integrate informative tokens from the global view instead of only in fixed local windows. Then, an early-exit strategy is designed to progressively halt tokens according to the token importance. To estimate the plausibility of each token, we adopt a lightweight confidence estimator, which is constrained by an uncertainty-guided loss to obtain a binary halting mask about the tokens. Experiments on large images have illustrated that our proposal reduces nearly 90% latency against SwinIR on Test8K, while maintaining a comparable performance.

AAAI Conference 2024 Conference Paper

SkipDiff: Adaptive Skip Diffusion Model for High-Fidelity Perceptual Image Super-resolution

  • Xiaotong Luo
  • Yuan Xie
  • Yanyun Qu
  • Yun Fu

It is well-known that image quality assessment usually meets with the problem of perception-distortion (p-d) tradeoff. The existing deep image super-resolution (SR) methods either focus on high fidelity with pixel-level objectives or high perception with generative models. The emergence of diffusion model paves a fresh way for image restoration, which has the potential to offer a brand-new solution for p-d trade-off. We experimentally observed that the perceptual quality and distortion change in an opposite direction with the increase of sampling steps. In light of this property, we propose an adaptive skip diffusion model (SkipDiff), which aims to achieve high-fidelity perceptual image SR with fewer sampling steps. Specifically, it decouples the sampling procedure into coarse skip approximation and fine skip refinement stages. A coarse-grained skip diffusion is first performed as a high-fidelity prior to obtaining a latent approximation of the full diffusion. Then, a fine-grained skip diffusion is followed to further refine the latent sample for promoting perception, where the fine time steps are adaptively learned by deep reinforcement learning. Meanwhile, this approach also enables faster sampling of diffusion model through skipping the intermediate denoising process to shorten the effective steps of the computation. Extensive experimental results show that our SkipDiff achieves superior perceptual quality with plausible reconstruction accuracy and a faster sampling speed.

NeurIPS Conference 2024 Conference Paper

UniDSeg: Unified Cross-Domain 3D Semantic Segmentation via Visual Foundation Models Prior

  • Yao Wu
  • Mingwei Xing
  • Yachao Zhang
  • Xiaotong Luo
  • Yuan Xie
  • Yanyun Qu

3D semantic segmentation using an adapting model trained from a source domain with or without accessing unlabeled target-domain data is the fundamental task in computer vision, containing domain adaptation and domain generalization. The essence of simultaneously solving cross-domain tasks is to enhance the generalizability of the encoder. In light of this, we propose a groundbreaking universal method with the help of off-the-shelf Visual Foundation Models (VFMs) to boost the adaptability and generalizability of cross-domain 3D semantic segmentation, dubbed $\textbf{UniDSeg}$. Our method explores the VFMs prior and how to harness them, aiming to inherit the recognition ability of VFMs. Specifically, this method introduces layer-wise learnable blocks to the VFMs, which hinges on alternately learning two representations during training: (i) Learning visual prompt. The 3D-to-2D transitional prior and task-shared knowledge is captured from the prompt space, and then (ii) Learning deep query. Spatial Tunability is constructed to the representation of distinct instances driven by prompts in the query space. Integrating these representations into a cross-modal learning framework, UniDSeg efficiently mitigates the domain gap between 2D and 3D modalities, achieving unified cross-domain 3D semantic segmentation. Extensive experiments demonstrate the effectiveness of our method across widely recognized tasks and datasets, all achieving superior performance over state-of-the-art methods. Remarkably, UniDSeg achieves 57. 5\%/54. 4\% mIoU on ``A2D2/sKITTI'' for domain adaptive/generalized tasks. Code is available at https: //github. com/Barcaaaa/UniDSeg.

NeurIPS Conference 2023 Conference Paper

Learning Re-sampling Methods with Parameter Attribution for Image Super-resolution

  • Xiaotong Luo
  • Yuan Xie
  • Yanyun Qu

Single image super-resolution (SISR) has made a significant breakthrough benefiting from the prevalent rise of deep neural networks and large-scale training samples. The mainstream deep SR models primarily focus on network architecture design as well as optimization schemes, while few pay attention to the training data. In fact, most of the existing SR methods train the model on uniformly sampled patch pairs from the whole image. However, the uneven image content makes the training data present an unbalanced distribution, i. e. , the easily reconstructed region (smooth) occupies the majority of the data, while the hard reconstructed region (edge or texture) has rarely few samples. Based on this phenomenon, we consider rethinking the current paradigm of merely using uniform data sampling way for training SR models. In this paper, we propose a simple yet effective Bi-Sampling Parameter Attribution (BSPA) method for accurate image SR. Specifically, the bi-sampling consists of uniform sampling and inverse sampling, which is introduced to reconcile the unbalanced inherent data bias. The former aims to keep the intrinsic data distribution, and the latter is designed to enhance the feature extraction ability of the model on the hard samples. Moreover, integrated gradient is introduced to attribute the contribution of each parameter in the alternate models trained by both sampling data so as to filter the trivial parameters for further dynamic refinement. By progressively decoupling the allocation of parameters, the SR model can learn a more compact representation. Extensive experiments on publicly available datasets demonstrate that our proposal can effectively boost the performance of baseline methods from the data re-sampling view.