Arrow Research search

Author name cluster

Xiaofeng Yang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers
2 author rows

Possible papers

13

AAAI Conference 2026 Conference Paper

Adaptive Piecewise Distillation for Efficient LiDAR Data Generation

  • Ruibo Li
  • Xiaofeng Yang
  • Ze Yang
  • Jiacheng Wei
  • Chunyan Miao
  • Guosheng Lin

LiDAR data generation has emerged as a promising solution to the high cost and limited scalability of real-world LiDAR sensing. Recent diffusion and rectified flow models have demonstrated strong capabilities in synthesizing realistic 3D point clouds; however, their iterative sampling procedures result in significant inference overhead. To address this, we focus on efficient few-step LiDAR generation for both unconditional and multi-modal conditional settings. Specifically, we propose an adaptive piecewise distillation strategy tailored for rectified flow-based LiDAR generation models, where the teacher model’s flow trajectory is adaptively segmented into consecutive intervals, and the student is trained only at the start of each interval to directly predict the velocity toward its endpoint. By sequentially sampling at the start timestep of each interval, our method enables fast few-step generation. Moreover, instead of uniform partitioning, we introduce an adaptive timestep selection strategy that chooses interval boundaries with minimal initial error, thereby reducing the complexity of distillation. Experimental results show that our method achieves comparable or superior performance to state-of-the-art methods in both unconditional and multi-modal conditional LiDAR generation, using only four sampling steps.

AAAI Conference 2026 Conference Paper

EfficientFSL: Enhancing Few-Shot Classification via Query-Only Tuning In Vision Transformers

  • Wenwen Liao
  • Hang Ruan
  • Jianbo Yu
  • Bing Song
  • Yuansong Wang
  • Xiaofeng Yang

Large models such as Vision Transformers (ViTs) have demonstrated remarkable superiority over smaller architectures like ResNet in few-shot classification, owing to their powerful representational capacity. However, fine-tuning such large models demands extensive GPU memory and prolonged training time, making them impractical for many real-world low-resource scenarios. To bridge this gap, we propose EfficientFSL, a query-only fine-tuning framework tailored specifically for few-shot classification with ViT, which achieves competitive performance while significantly reducing computational overhead. EfficientFSL fully leverages the knowledge embedded in the pre-trained model and its strong comprehension ability, achieving high classification accuracy with an extremely small number of tunable parameters. Specifically, we introduce a lightweight trainable Forward Block to synthesize task-specific queries that extract informative features from the intermediate representations of the pre-trained model in a query-only manner. We further propose a Combine Block to fuse multi-layer outputs, enhancing the depth and robustness of feature representations. Finally, a Support-Query Attention Block mitigates distribution shift by adjusting prototypes to align with the query set distribution. With minimal trainable parameters, EfficientFSL achieves state-of-the-art performance on four in-domain few-shot datasets and six cross-domain datasets, demonstrating its effectiveness in real-world applications.

ICML Conference 2025 Conference Paper

ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization

  • Wenhao Shen
  • Wanqi Yin
  • Xiaofeng Yang
  • Cheng Chen
  • Chaoyue Song
  • Zhongang Cai
  • Lei Yang 0045
  • Hao Wang 0094

Human mesh recovery (HMR) from a single image is inherently ill-posed due to depth ambiguity and occlusions. Probabilistic methods have tried to solve this by generating numerous plausible 3D human mesh predictions, but they often exhibit misalignment with 2D image observations and weak robustness to in-the-wild images. To address these issues, we propose ADHMR, a framework that A ligns a D iffusion-based HMR model in a preference optimization manner. First, we train a human mesh prediction assessment model, HMR-Scorer, capable of evaluating predictions even for in-the-wild images without 3D annotations. We then use HMR-Scorer to create a preference dataset, where each input image has a pair of winner and loser mesh predictions. This dataset is used to finetune the base model using direct preference optimization. Moreover, HMR-Scorer also helps improve existing HMR models by data cleaning, even with fewer training samples. Extensive experiments show that ADHMR outperforms current state-of-the-art methods. Code is available at: https: //github. com/shenwenhao01/ADHMR.

AAAI Conference 2025 Conference Paper

IPVTON: Image-based 3D Virtual Try-on with Image Prompt Adapter

  • Xiaojing Zhong
  • Zhonghua Wu
  • Xiaofeng Yang
  • Guosheng Lin
  • Qingyao Wu

Given a pair of images depicting a person and a garment separately, image-based 3D virtual try-on methods aim to reconstruct a 3D human model that realistically portrays the person wearing the desired garment. In this paper, we present IPVTON, a novel image-based 3D virtual try-on framework. IPVTON employs score distillation sampling with image prompts to optimize a hybrid 3D human representation, integrating target garment features into diffusion priors through an image prompt adapter. To avoid interference with non-target areas, we leverage mask-guided image prompt embeddings to focus the image features on the try-on regions. Moreover, we impose geometric constraints on the 3D model with a pseudo silhouette generated by ControlNet, ensuring that the clothed 3D human model retains the shape of the source identity while accurately wearing the target garments. Extensive qualitative and quantitative experiments demonstrate that IPVTON outperforms previous methods in image-based 3D virtual try-on tasks, excelling in both geometry and texture.

ECAI Conference 2025 Conference Paper

Overexposed Frame Reconstruction in Ultra-High-Speed Imaging via Event-Guided Diffusion Models

  • Han Wang
  • Sijia Liu
  • Juntao Wu
  • Xirui Zhang
  • Zhou Wang
  • Xiaofeng Yang
  • Yaoxiong Wang
  • Saiao Zhou

Ultra-high-speed cameras frequently suffer from severe overexposure in scenarios involving extreme brightness transitions, significantly degrading image quality and obscuring critical visual details. To address this issue, we propose a novel reconstruction method combining neuromorphic sensors with state-of-the-art diffusion models. Our approach leverages the asynchronous, high-temporal-resolution, and high-dynamic-range capabilities of neuro-morphic sensors to capture rapid brightness variations, subsequently utilizing conditional diffusion models to reconstruct high-quality frames from sparse event data. We validated the proposed method through experiments conducted under three challenging lighting conditions. The results demonstrate that our approach effectively recovers detailed visual content in severely overexposed frames, significantly outperforming traditional frame-based imaging techniques.

JBHI Journal 2025 Journal Article

SyncLearnNet: Generalized Epileptic Seizure Detection Network Based on Brain Signals

  • Yuer Ma
  • Jialin Wang
  • Jiaoyang Wang
  • Wenxiong Kang
  • Xiaofeng Yang

Epilepsy is a prevalent neurological disorder with significant detrimental effects on health. Accurate seizure detection is crucial for the precise diagnosis and effective treatment of epilepsy. Brain signals is widely recognized as a reliable clinical tool for diagnosing and evaluating severity of seizures. Traditionally, medical researchers have relied on visual inspection to identify and locate seizures and epileptogenic areas. However, manual analysis of brain data is both subjective and time-consuming. In recent years, there has been a surge in studies focusing on automatic seizure detection algorithms based on brain signals, driven by the advancements in artificial intelligence and digital brain signal technology. Nevertheless, in tackling this task, many of these studies have neglected to leverage the rich implicit information of samples to extract comprehensive feature representation for enhancing model performance. To address this gap, we propose a generalized model called SyncLearnNet for seizure detection based on brain signals. SyncLearnNet incorporates VariaScan and BatchAttention modules designed to fully utilize both intra-sample and inter-sample information, thereby improving feature discrimination without requiring additional data. Furthermore, the introduction of CurriClassifier aims to enhance the model's generalization performance. Experiments conducted on a public human seizure dataset CHB-MIT and a self-built animal seizure dataset comprising data from five rats demonstrated this method outperforms existing seizure detection methods in terms of generalization performance.

ICLR Conference 2025 Conference Paper

Text-to-Image Rectified Flow as Plug-and-Play Priors

  • Xiaofeng Yang
  • Cheng Chen
  • Xulei Yang
  • Fayao Liu
  • Guosheng Lin

Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified Flow, a novel class of generative models, has demonstrated superior performance across various domains. Compared to diffusion-based methods, rectified flow approaches surpass them in terms of generation quality and efficiency. In this work, we present theoretical and experimental evidence demonstrating that rectified flow based methods offer similar functionalities to diffusion models — they can also serve as effective priors. Besides the generative capabilities of diffusion priors, motivated by the unique time-symmetry properties of rectified flow models, a variant of our method can additionally perform image inversion. Experimentally, our rectified flow based priors outperform their diffusion counterparts — the SDS and VSD losses — in text-to-3D generation. Our method also displays competitive performance in image inversion and editing. Code is available at: https://github.com/yangxiaofeng/rectified_flow_prior.

AAAI Conference 2024 Conference Paper

Diverse and Stable 2D Diffusion Guided Text to 3D Generation with Noise Recalibration

  • Xiaofeng Yang
  • Fayao Liu
  • Yi Xu
  • Hanjing Su
  • Qingyao Wu
  • Guosheng Lin

In recent years, following the success of text guided image generation, text guided 3D generation has gained increasing attention among researchers. Dreamfusion is a notable approach that enhances generation quality by utilizing 2D text guided diffusion models and introducing SDS loss, a technique for distilling 2D diffusion model information to train 3D models. However, the SDS loss has two major limitations that hinder its effectiveness. Firstly, when given a text prompt, the SDS loss struggles to produce diverse content. Secondly, during training, SDS loss may cause the generated content to overfit and collapse, limiting the model's ability to learn intricate texture details. To overcome these challenges, we propose a novel approach called Noise Recalibration algorithm. By incorporating this technique, we can generate 3D content with significantly greater diversity and stunning details. Our approach offers a promising solution to the limitations of SDS loss.

AAAI Conference 2024 Conference Paper

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis

  • Yiwen Chen
  • Chi Zhang
  • Xiaofeng Yang
  • Zhongang Cai
  • Gang Yu
  • Lei Yang
  • Guosheng Lin

Recent strides in Text-to-3D techniques have been propelled by distilling knowledge from powerful large text-to-image diffusion models (LDMs). Nonetheless, existing Text-to-3D approaches often grapple with challenges such as over-saturation, inadequate detailing, and unrealistic outputs. This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues. Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images based on the renderings of coarse 3D models. Although the generated images mostly alleviate the aforementioned issues, challenges such as view inconsistency and significant content variance persist due to the inherent generative nature of large diffusion models, posing extensive difficulties in leveraging these images effectively. To overcome this hurdle, we advocate integrating a discriminator alongside a novel Diffusion-GAN dual training strategy to guide the training of 3D models. For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data. We conduct a comprehensive set of experiments that demonstrate the effectiveness of our method over baseline approaches.

IJCAI Conference 2024 Conference Paper

Visual Attention Prompted Prediction and Learning

  • Yifei Zhang
  • Bo Pan
  • Siyi Gu
  • Guangji Bai
  • Meikang Qiu
  • Xiaofeng Yang
  • Liang Zhao

Visual explanation (attention)-guided learning uses not only labels but also explanations to guide the model reasoning process. While visual attention-guided learning has shown promising results, it requires a large number of explanation annotations that are time-consuming to prepare. However, in many real-world situations, it is usually desired to prompt the model with visual attention without model retraining. For example, when doing AI-assisted cancer classification on a medical image, users (e. g. , clinicians) can provide the AI model with visual attention prompts on which areas are indispensable and which are precluded. Despite its promising objectives, achieving visual attention-prompted prediction presents several major challenges: 1) How can the visual prompt be effectively integrated into the model's reasoning process? 2) How should the model handle samples that lack visual prompts? 3) What is the impact on the model's performance when a visual prompt is imperfect? This paper introduces a novel framework for visual attention prompted prediction and learning, utilizing visual prompts to steer the model's reasoning process. To improve performance in non-prompted situations and align it with prompted scenarios, we propose a co-training approach for both non-prompted and prompted models, ensuring they share similar parameters and activation. Additionally, for instances where the visual prompt does not encompass the entire input image, we have developed innovative attention prompt refinement methods. These methods interpolate the incomplete prompts while maintaining alignment with the model's explanations. Extensive experiments on four datasets demonstrate the effectiveness of our proposed framework in enhancing predictions for samples both with and without prompt.

EAAI Journal 2023 Journal Article

Fuzzy adaptive terminal sliding mode control based on recurrent neural network compensation for a maglev system

  • Xinyi Su
  • Xiaofeng Yang
  • Yunlang Xu

Reluctance motor has high force density while strong uncertainties. This paper concentrates on the research of the intelligent control method for a reluctance-motor maglev system (RMMS) with limited system information. To this end, we propose a new fuzzy adaptive terminal sliding mode control (FATSMC) method based on a novel full-regulated recurrent neural network (RNN) compensator. The RNN approximates the lumped uncertainty of the RMMS, and the controller handles the residual error and external disturbances. The presented increasing–decreasing fuzzy adaptive law, without the prior knowledge of uncertain upper bound, adjusts the switching gain by the dynamic information of the system. And the overestimation of the switching gain can be reduced. Moreover, the proposed method only uses the nominal parameters and model of the RMMS, thus avoiding complex modeling. This study provides strong support for the application of RMMS in high-precision situations. By constructing a Lyapunov function, the stability of the proposed method is analyzed, which is uniformly and ultimately bounded (UUB). In the experiment, the proposed method is compared with three existing methods in four cases. The results demonstrate that the presented method achieves nanoscale suspension accuracy, suppresses chattering significantly, and has strong robustness.

YNIMG Journal 2021 Journal Article

Characterizing the seizure onset zone and epileptic network using EEG-fMRI in a rat seizure model

  • Junling Wang
  • Bin Jing
  • Ru Liu
  • Donghong Li
  • Wei Wang
  • Jiaoyang Wang
  • Jianfeng Lei
  • Yue Xing

Accurate epileptogenic zone (EZ) or seizure onset zone (SOZ) localization is crucial for epilepsy surgery optimization. Previous animal and human studies on epilepsy have reported that changes in blood oxygen level-dependent (BOLD) signals induced by epileptic events could be used as diagnostic markers for EZ or SOZ localization. Simultaneous electroencephalography and functional magnetic resonance imaging (EEG-fMRI) recording is gaining interest as a non-invasive tool for preoperative epilepsy evaluation. However, EEG-fMRI studies have reported inconsistent and ambiguous findings. Therefore, it remains unclear whether BOLD responses can be used for accurate EZ or SOZ localization. In this study, we used simultaneous EEG-fMRI recording in a rat model of 4-aminopyridine-induced acute focal seizures to assess the spatial concordance between individual BOLD responses and the SOZ. This was to determine the optimal use of simultaneous EEG-fMRI recording in the SOZ localization. We observed a high spatial consistency between BOLD responses and the SOZ. Further, dynamic BOLD responses were consistent with the regions where the seizures were propagated. These results suggested that simultaneous EEG-fMRI recording could be used as a noninvasive clinical diagnostic technique for localizing the EZ or SOZ and could be an effective tool for mapping epileptic networks.

YNICL Journal 2019 Journal Article

Revealing hemodynamic heterogeneity of gliomas based on signal profile features of dynamic susceptibility contrast-enhanced MRI

  • Bing Ji
  • Silun Wang
  • Zhou Liu
  • Brent D. Weinberg
  • Xiaofeng Yang
  • Tianming Liu
  • Liya Wang
  • Hui Mao

Dynamic susceptibility contrast enhanced magnetic resonance imaging (DSC MRI) is widely used for studying blood perfusion in brain tumors. While the time-dependent change of MRI signals related to the concentration of the tracer is used to derive the hemodynamic parameters such as regional blood volume and flow into tumors, the tissue-specific information associated with variations in profiles of signal time course is often overlooked. We report a new approach of combining model free independent component analysis (ICA) identification of specific signal profiles of DSC MRI time course data and extraction of the features from those time course profiles to interrogate time course data followed by calculating the region specific blood volume based on selected individual time courses. Based on the retrospective analysis of DSC MRI data from 38 patients with pathology confirmed low (n = 18) and high (n = 20) grade gliomas, the results reveal the spatially defined intra-tumoral hemodynamic heterogeneity of brain tumors based on features of time course profiles. The hemodynamic heterogeneity as measured by the number of independent components of time course data is associated with the tumor grade. Using 8 selected signal profile features, machine-learning trained algorithm, e.g., logistic regression, was able to differentiate pathology confirmed low intra-tumoral and high grade gliomas with an accuracy of 86.7%. Furthermore, the new method can potentially extract more tumor physiological information from DSC MRI comparing to the traditional model-based analysis and morphological analysis of tumor heterogeneity, thus may improve the characterizations of gliomas for better diagnosis and treatment decisions.