Author name cluster

Yan Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2026 Conference Paper

ECD: Evidence-guided Contrastive Decoding in Retrieval-Augmented Generation with Accurate Knowledge Reference Adjustment

Yize Sui
Yan Xu
Kun Hu
Jing Ren
Wenjing Yang

Retrieval-Augmented Generation (RAG) enhances the quality of question answering by integrating external knowledge with internal knowledge. A robust RAG system needs to precisely regulate the dependence of the response on the two types of knowledge. The recently proposed context-aware contrastive decoding (CCD) method attempts to achieve this goal by adjusting the knowledge reference weights by comparing the output distribution differences of LLMS when they rely on different knowledge sources. However, these methods are based on probabilistic knowledge reference adjustment strategies (such as the highest probability or entropy), only focus on the relative confidence of the output responses at each decoding step, without considering the absolute confidence of the responses, which may lead to misjudgment of the external knowledge and internal knowledge reference degree in the decoding process. To this end, we propose a novel decoding method, Evidence-guided Contrastive Decoding (ECD), which conducts evidence modeling by constructing the Dirichlet distribution and regards logits as evidence vectors, so as to regulate the reference degree of internal and external knowledge more accurately, and finally improve the quality of generated responses. Extensive evaluations across four public benchmark datasets on three mainstream LLMs have demonstrated the effectiveness and advantages of ECD.

PDF Details DOI

JBHI Journal 2026 Journal Article

Multidomain Selective Feature Fusion and Stacking Based Ensemble Framework for EEG-Based Neonatal Sleep Stratification

Muhammad Irfan
Laishuan Wang
Husnain Shahid
Yan Xu
Abdulhamit Subasi
Adnan Munawar
Noman Mustafa
Chen Chen

Employing a minimal array of electroencephalography (EEG) channels for neonatal sleep stage classification is essential for data acquisition in the Internet of Medical Things (IoMT), as single-channel and edge-based features can reduce data transfer and processing requirements, enhancing cost-effectiveness and practicality. In this paper, we evaluate the efficacy of a single channel and the viability of a binary classification scheme for discerning awake and sleep states and transitions to quiet sleep. For this, two datasets of EEG signals for neonate sleep analysis were recorded from Children's Hospital of Fudan University, Shanghai, comprising recordings from 64 and 19 neonates, respectively. From each epoch, a diverse ensemble of 490 features was extracted through a blend of discrete and continuous wavelet transforms (DWT, CWT), spectral statistics, and temporal features. In addition, we introduced an innovative hybrid univariate and ensemble feature selection approach with multidomain feature fusion, a stacking-based ensemble classifier that outperforms existing work. We achieved 90. 37%, 91. 13%, and 94. 88% accuracy for sleep/awake, quiet sleep/non-quiet sleep, and quiet sleep/awake, respectively. This was corroborated by significant Kappa values of 77. 5%, 80. 29%, and 89. 76%. Using SelectPercentile, we devised three distinct feature selection mechanisms: one using DWT, one with CWT, and another incorporating both spectral and temporal features. Subsequently, SelectKBest was used to determine the most effective features. For our stacked model, we incorporated a trifecta of the ExtraTree model with variable estimators, a Random Forest, and an Artificial Neural Network (ANN) as base classifiers, and for the final prediction phase, ANN was implemented again. The model's performance was evaluated using K-fold and leave-one-subject cross-validation.

Details DOI

JBHI Journal 2026 Journal Article

Restore-RWKV: Efficient and Effective Medical Image Restoration With RWKV

Zhiwen Yang
Jiayin Li
Hui Zhang
Dan Zhao
Bingzheng Wei
Yan Xu

Transformers have revolutionized medical image restoration, but the quadratic complexity still poses limitations for their application to high-resolution medical images. The recent advent of the Receptance Weighted Key Value (RWKV) model in the natural language processing field has attracted much attention due to its ability to process long sequences efficiently. To leverage its advanced design, we propose Restore-RWKV, the first RWKV-based model for medical image restoration. Since the original RWKV model is designed for 1D sequences, we make two necessary modifications for modeling spatial relations in 2D medical images. First, we present a recurrent WKV (Re-WKV) attention mechanism that captures global dependencies with linear computational complexity. Re-WKV incorporates bidirectional attention as basic for a global receptive field and recurrent attention to effectively model 2D dependencies from various scan directions. Second, we develop an omnidirectional token shift (Omni-Shift) layer that enhances local dependencies by shifting tokens from all directions and across a wide context range. These adaptations make the proposed Restore-RWKV an efficient and effective model for medical image restoration. Even a lightweight variant of Restore-RWKV, with only 1. 16 million parameters, achieves comparable or even superior results compared to existing state-of-the-art (SOTA) methods. Extensive experiments demonstrate that the resulting Restore-RWKV achieves SOTA performance across a range of medical image restoration tasks, including PET image synthesis, CT image denoising, MRI image super-resolution, and all-in-one medical image restoration.

Details DOI

EAAI Journal 2026 Journal Article

Text-guided class-incremental point cloud semantic segmentation with category distribution constraint

Chao Zheng
Yan Xu
Xiaorui Peng
Meijun Wang
Yu Meng

Details DOI

EAAI Journal 2025 Journal Article

Crude oil price forecasting with multivariate selection, machine learning, and a nonlinear combination strategy

Yan Xu
Tianli Liu
Qi Fang
Pei Du
Jianzhou Wang

Details DOI

EAAI Journal 2025 Journal Article

Model-free safe deep reinforcement learning for grid-to-vehicle management considering grid constraints and transformer thermal stress

Zhewei Zhang
Rémy Rigo-Mariani
Nouredine Hadjsaid
Yan Xu

Details DOI

NeurIPS Conference 2025 Conference Paper

NopeRoomGS: Indoor 3D Gaussian Splatting Optimization without Camera Pose Input

Wenbo Li
Yan Xu
Mingde Yao
Fengjie Liang
Jiankai Sun
Menglu Wang
Guofeng Zhang
Linjiang Huang

Recent advances in 3D Gaussian Splatting (3DGS) have enabled real-time, high-fidelity view synthesis, but remain critically dependent on camera poses estimated by Structure-from-Motion (SfM), which is notoriously unreliable in textureless indoor environments. To eliminate this dependency, recent pose-free variants have been proposed, yet they often fail under abrupt camera motion due to unstable initialization and purely photometric objectives. In this work, we introduce Nope-RoomGS, an optimization framework with no need for camera pose inputs, which effectively addresses the textureless regions and abrupt camera motion in indoor room environments through a local-to-global optimization paradigm for 3DGS reconstruction. In the local stage, we propose a lightweight local neural geometric representation to bootstrap a set of reliable local 3D Gaussians for separated short video clips, regularized by multi-frame tracking constraints and foundation model depth priors. This enables reliable initialization even in textureless regions or under abrupt camera motions. In the global stage, we fuse local 3D Gaussians into a unified 3DGS representation through an alternating optimization strategy that jointly refines camera poses and Gaussian parameters, effectively mitigating gradient interference between them. Furthermore, we decompose camera pose optimization based on a piecewise planarity assumption, further enhancing robustness under abrupt camera motion. Extensive experiments on Replica, ScanNet and Tanks & Temples demonstrate the state-of-the-art performance of our method in both camera pose estimation and novel view synthesis.

PDF Details

NeurIPS Conference 2025 Conference Paper

Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion

Yan Xu
Yixing Wang
Stella X. Yu

Given just a few glimpses of a scene, can you imagine the movie playing out as the camera glides through it? That’s the lens we take on sparse-input novel view synthesis, not only as filling spatial gaps between widely spaced views, but also as completing a natural video unfolding through space. We recast the task as test-time natural video completion, using powerful priors from pretrained video diffusion models to hallucinate plausible in-between views. Our zero-shot, generation-guided framework produces pseudo views at novel camera poses, modulated by an uncertainty-aware mechanism for spatial coherence. These synthesized frames densify supervision for 3D Gaussian Splatting (3D-GS) for scene reconstruction, especially in under-observed regions. An iterative feedback loop lets 3D geometry and 2D view synthesis inform each other, improving both the scene reconstruction and the generated views. The result is coherent, high-fidelity renderings from sparse inputs without any scene-specific training or fine-tuning. On LLFF, DTU, DL3DV, and MipNeRF-360, our method significantly outperforms strong 3D-GS baselines under extreme sparsity. Our project page is at https: //decayale. github. io/project/SV2CGS.

PDF Details

NeurIPS Conference 2025 Conference Paper

QFFT, Question-Free Fine-Tuning for Adaptive Reasoning

Wanlong Liu
Junxiao Xu
Fei Yu
Yukang Lin
Ke Ji
Wenyu Chen
Lifeng Shang
Yasheng Wang

Recent advancements in Long Chain-of-Thought (CoT) reasoning models have improved performance on complex tasks, but they suffer from overthinking, which generates redundant reasoning steps, especially for simple questions. This paper revisits the reasoning patterns of Long and Short CoT models, observing that the Short CoT patterns offer concise reasoning efficiently, while the Long CoT patterns excel in challenging scenarios where the Short CoT patterns struggle. To enable models to leverage both patterns, we propose Question-Free Fine-Tuning (QFFT), a fine-tuning approach that removes the input question during training and learns exclusively from Long CoT responses. This approach enables the model to adaptively employ both reasoning patterns: it prioritizes the Short CoT patterns and activates the Long CoT patterns only when necessary. Experiments on various mathematical datasets demonstrate that QFFT reduces average response length by more than 50\%, while achieving performance comparable to Supervised Fine-Tuning (SFT). Additionally, QFFT exhibits superior performance compared to SFT in noisy, out-of-domain, and low-resource scenarios.

PDF Details

JBHI Journal 2025 Journal Article

TDFormer: Top-Down Token Generation for 3D Medical Image Segmentation

Hao Du
Qihua Dong
Yan Xu
Jing Liao

Accurate medical image segmentation is critical to effective treatment strategies. Existing transformer-based methods for image segmentation mostly split the input image into a fixed and regular grid and regard cells in the grid as the vision tokens. However, not all tokens are of equal importance in the medical segmentation tasks, e. g. , the tokens in tumor areas must be processed in a higher resolution than the background tokens which can be easily predicted with fewer transformer layers. In this paper, we propose a simple yet efficient segmentation framework called Top-Down Transformer (TDFormer), which incorporates a spatially adaptive token generation scheme into the transformer. The proposed top-down token generation comprises the following three components: attentiveness calculation, token splitting, and token fusion, where the collaboration of these components gradually fuses redundant background tokens and focuses only on the most critical areas. This allows for allocating more computation to process tokens containing delicate details in a finer resolution. Extensive experiments are conducted to demonstrate the robustness and effectiveness of the proposed TDFormer, that our method are superior to other state-of-the-art methods on the following publicly accessible datasets: BTCV Challenge, LiTS and BraTS 2020. We also dissect our method and evaluate the performance of each component.

Details DOI

NeurIPS Conference 2024 Conference Paper

Pedestrian-Centric 3D Pre-collision Pose and Shape Estimation from Dashcam Perspective

Meijun Wang
Yu Meng
Zhongwei Qiu
Chao Zheng
Yan Xu
Xiaorui Peng
Jian Gao

Pedestrian pre-collision pose is one of the key factors to determine the degree of pedestrian-vehicle injury in collision. Human pose estimation algorithm is an effective method to estimate pedestrian emergency pose from accident video. However, the pose estimation model trained by the existing daily human pose datasets has poor robustness under specific poses such as pedestrian pre-collision pose, and it is difficult to obtain human pose datasets in the wild scenes, especially lacking scarce data such as pedestrian pre-collision pose in traffic scenes. In this paper, we collect pedestrian-vehicle collision pose from the dashcam perspective of dashcam and construct the first Pedestrian-Vehicle Collision Pose dataset (PVCP) in a semi-automatic way, including 40k+ accident frames and 20K+ pedestrian pre-collision pose annotation (2D, 3D, Mesh). Further, we construct a Pedestrian Pre-collision Pose Estimation Network (PPSENet) to estimate the collision pose and shape sequence of pedestrians from pedestrian-vehicle accident videos. The PPSENet first estimates the 2D pose from the image (Image to Pose, ITP) and then lifts the 2D pose to 3D mesh (Pose to Mesh, PTM). Due to the small size of the dataset, we introduce a pre-training model that learns the human pose prior on a large number of pose datasets, and use iterative regression to estimate the pre-collision pose and shape of pedestrians. Further, we classify the pre-collision pose sequence and introduce pose class loss, which achieves the best accuracy compared with the existing relevant \textit{state-of-the-art} methods. Code and data are available for research at https: //github. com/wmj142326/PVCP.

PDF Details DOI

EAAI Journal 2024 Journal Article

Power load combination forecasting system based on longitudinal data selection

Yan Xu
Jing Li
Yan Dong
Pei Du

Details DOI

NeurIPS Conference 2023 Conference Paper

Exploiting Contextual Objects and Relations for 3D Visual Grounding

Li Yang
Chunfeng Yuan
Ziqi Zhang
Zhongang Qi
Yan Xu
Wei Liu
Ying Shan
Bing Li

3D visual grounding, the task of identifying visual objects in 3D scenes based on natural language inputs, plays a critical role in enabling machines to understand and engage with the real-world environment. However, this task is challenging due to the necessity to capture 3D contextual information to distinguish target objects from complex 3D scenes. The absence of annotations for contextual objects and relations further exacerbates the difficulties. In this paper, we propose a novel model, CORE-3DVG, to address these challenges by explicitly learning about contextual objects and relations. Our method accomplishes 3D visual grounding via three sequential modular networks, including a text-guided object detection network, a relation matching network, and a target identification network. During training, we introduce a pseudo-label self-generation strategy and a weakly-supervised method to facilitate the learning of contextual objects and relations, respectively. The proposed techniques allow the networks to focus more effectively on referred objects within 3D scenes by understanding their context better. We validate our model on the challenging Nr3D, Sr3D, and ScanRefer datasets and demonstrate state-of-the-art performance. Our code will be public at https: //github. com/yangli18/CORE-3DVG.

PDF Details

JBHI Journal 2022 Journal Article

Vision-Based Finger Tapping Test in Patients With Parkinson’s Disease via Spatial-Temporal 3D Hand Pose Estimation

Zhilin Guo
Weiqi Zeng
Taidong Yu
Yan Xu
Yang Xiao
Xuebing Cao
Zhiguo Cao

Finger tapping test is crucial for diagnosing Parkinson’s Disease (PD), but manual visual evaluations can result in score discrepancy due to clinicians’ subjectivity. Moreover, applying wearable sensors requires making physical contact and may hinder PD patient’s raw movement patterns. Accordingly, a novel computer-vision approach is proposed using depth camera and spatial-temporal 3D hand pose estimation to capture and evaluate PD patients’ 3D hand movement. Within this approach, a temporal encoding module is leveraged to extend A2J’s deep learning framework to counter the pose jittering problem, and a pose refinement process is utilized to alleviate dependency on massive data. Additionally, the first vision-based 3D PD hand dataset of 112 hand samples from 48 PD patients and 11 control subjects is constructed, fully annotated by qualified physicians under clinical settings. Testing on this real-world data, this new model achieves 81. 2% classification accuracy, even surpassing that of individual clinicians in comparison, fully demonstrating this proposition’s effectiveness. The demo video can be accessed at https://github.com/ZhilinGuo/ST-A2J.

Details DOI

YNIMG Journal 2021 Journal Article

Brain responses to drug cues predict craving changes in abstinent heroin users: A preliminary study

Shuang Liu
Shicong Wang
Min Zhang
Yan Xu
Ziqiang Shao
Longmao Chen
Wenhan Yang
Jun Liu

Details DOI

AAAI Conference 2021 Conference Paper

CrossNER: Evaluating Cross-Domain Named Entity Recognition

Zihan Liu
Yan Xu
Tiezheng Yu
Wenliang Dai
Ziwei Ji
Samuel Cahyawijaya
Andrea Madotto
Pascale Fung

Cross-domain named entity recognition (NER) models are able to cope with the scarcity issue of NER samples in target domains. However, most of the existing NER benchmarks lack domain-specialized entity types or do not focus on a certain domain, leading to a less effective cross-domain evaluation. To address these obstacles, we introduce a cross-domain NER dataset (CrossNER), a fully-labeled collection of NER data spanning over five diverse domains with specialized entity categories for different domains. Additionally, we also provide a domain-related corpus since using it to continue pre-training language models (domain-adaptive pre-training) is effective for the domain adaptation. We then conduct comprehensive experiments to explore the effectiveness of leveraging different levels of the domain corpus and pre-training strategies to do domain-adaptive pre-training for the crossdomain task. Results show that focusing on the fractional corpus containing domain-specialized entities and utilizing a more challenging pre-training strategy in domain-adaptive pre-training are beneficial for the NER domain adaptation, and our proposed method can consistently outperform existing cross-domain NER baselines. Nevertheless, experiments also illustrate the challenge of this cross-domain NER task. We hope that our dataset and baselines will catalyze research in the NER domain adaptation area. The code and data are available at https: //github. com/zliucr/CrossNER.

PDF Details

YNICL Journal 2020 Journal Article

Dysfunction of the NAc-mPFC circuit in insomnia disorder

Ziqiang Shao
Yan Xu
Longmao Chen
Shicong Wang
Min Zhang
Shuang Liu
Xinwen Wen
Dahua Yu

Details DOI

JBHI Journal 2020 Journal Article

Unsupervised 3D End-to-End Medical Image Registration With Volume Tweening Network

Shengyu Zhao
Tingfung Lau
Ji Luo
Eric I-Chao Chang
Yan Xu

3D medical image registration is of great clinical importance. However, supervised learning methods require a large amount of accurately annotated corresponding control points (or morphing), which are very difficult to obtain. Unsupervised learning methods ease the burden of manual annotation by exploiting unlabeled data without supervision. In this article, we propose a new unsupervised learning method using convolutional neural networks under an end-to-end framework, Volume Tweening Network (VTN), for 3D medical image registration. We propose three innovative technical components: (1) An end-to-end cascading scheme that resolves large displacement; (2) An efficient integration of affine registration network; and (3) An additional invertibility loss that encourages backward consistency. Experiments demonstrate that our algorithm is 880x faster (or 3. 3x faster without GPU acceleration) than traditional optimization-based methods and achieves state-of-the-art performance in medical image registration.

Details DOI

JBHI Journal 2019 Journal Article

Unsupervised Learning for Cell-Level Visual Representation in Histopathology Images With Generative Adversarial Networks

Bo Hu
Ye Tang
Eric I-Chao Chang
Yubo Fan
Maode Lai
Yan Xu

The visual attributes of cells, such as the nuclear morphology and chromatin openness, are critical for histopathology image analysis. By learning cell-level visual representation, we can obtain a rich mix of features that are highly reusable for various tasks, such as celllevel classification, nuclei segmentation, and cell counting. In this paper, we propose a unified generative adversarial networks architecture with a new formulation of loss to perform robust cell-level visual representation learning in an unsupervised setting. Our model is not only label-free and easily trained but also capable of cell-level unsupervised classification with interpretable visualization, which achieves promising results in the unsupervised classification of bone marrow cellular components. Based on the proposed cell-level visual representation learning, we further develop a pipeline that exploits the varieties of cellular elements to perform histopathology image classification, the advantages of which are demonstrated on bone marrow datasets.

Details DOI

IJCAI Conference 2018 Conference Paper

3D-Aided Deep Pose-Invariant Face Recognition

Jian Zhao
Lin Xiong
Yu Cheng
Yi Cheng
Jianshu Li
Li Zhou
Yan Xu
Jayashree Karlekar

Learning from synthetic faces, though perhaps appealing for high data efficiency, may not bring satisfactory performance due to the distribution discrepancy of the synthetic and real face images. To mitigate this gap, we propose a 3D-Aided Deep Pose-Invariant Face Recognition Model (3D-PIM), which automatically recovers realistic frontal faces from arbitrary poses through a 3D face model in a novel way. Specifically, 3D-PIM incorporates a simulator with the aid of a 3D Morphable Model (3D MM) to obtain shape and appearance prior for accelerating face normalization learning, requiring less training data. It further leverages a global-local Generative Adversarial Network (GAN) with multiple critical improvements as a refiner to enhance the realism of both global structures and local details of the face simulator’s output using unlabelled real data only, while preserving the identity information. Qualitative and quantitative experiments on both controlled and in-the-wild benchmarks clearly demonstrate superiority of the proposed model over state-of-the-arts.

PDF Details

AAAI Conference 2017 Conference Paper

Optimizing Quantiles in Preference-Based Markov Decision Processes

Hugo Gilbert
Paul Weng
Yan Xu

In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both ﬁnite and inﬁnite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.

PDF Details

ICRA Conference 2016 Conference Paper

Control and experimental validation of robot-assisted automatic measurement system for Multi-Stud Tensioning Machine (MSTM)

Meng Li
Xingguang Duan
Haoyuan Li
Tengfei Cui
Liang Gao
Yue Zhan
Yan Xu

Multi-Stud Tensioning Machine (MSTM) is a specialized equipment used to open/seal the cover of the Reactor Pressure Vessel (RPV) during nuclear power plant maintenance. The tensioning residual values of the 58 studs are monitored for procedure evaluation. It is time-consuming for human operators to place the measurement meters into working positions. In order to reduce labor intensity and eliminate radiation exposure time, we develop a robot-assisted automatic measurement system to achieve meter placement and real-time data monitoring. The Field Programmable Gate Array (FPGA)-based distributed control scheme realizes high-speed data acquisition and coordinated control of the 58 node robots. The control software performs data analysis and sends emergency stop signals to the MSTM control PLC. The proposed system is validated in China Nuclear Power Technology Research Institute. Total operation time decreases from over 580 s to less than 120 s.

Details

IS Journal 2013 Journal Article

Using IS to Assess an Electric Power System's Real-Time Stability

Zhao Yang Dong
Yan Xu
Pei Zhang
Kit Po Wong

How can an intelligent system (IS) improve electric power system real-time stability assessment? Here, some techniques are described that address critical issues and offer solutions surrounding IS development and implementation.

Details DOI

IROS Conference 2006 Conference Paper

Camera Calibration Based on the RBF Neural Network with Tunable Nodes forVisual Servoing in Robotics

Xiaoping Zong
Yan Xu
Lei Hao
Xiaoli Huai

In this paper, a new approach based on the radial basis function network for solving the camera calibration problem in visual servoing robot is proposed. In this approach, an extended multi-input and multi-output orthogonal forward selection algorithm based on the leave-one-out criterion is applied for the construction of radial basis function (RBF) networks with tunable nodes. This algorithm is computationally efficient and is capable of identifying parsimonious RBF networks that generalize well. Moreover, the proposed algorithm is fully automatic and the user does not need to specify a termination criterion for the construction process. The constructed parsimonious multi-input and multi-output RBF network can complete camera calibration automatically and rapidly, and the simulation has proved that the approach is feasible

Details