Author name cluster

Jun Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

36 papers

2 author rows

JBHI Journal 2026 Journal Article

Hybrid Multi-View MRI Fusion for csPCa Diagnosis via Intra- and Inter-View Transformers

Yuchen Zhao
Danyan Li
Teng Zhang
Xiangxue Wang
Jun Xu
Kai Xuan

Accurate diagnosis of clinically significant prostate cancer (csPCa) from multi-view MRI scans (axial, sagittal, and coronal) is essential for effective treatment planning and improved outcomes. Although deep learning has advanced prostate MRI analysis, many existing approaches adopt late fusion strategies that aggregate one-dimensional feature vectors extracted independently from each view, resulting in loss of spatial information and anatomical correspondence across views, ultimately limiting diagnostic performance. While Vision Transformers offer flexibility in processing multi-view patches, their memory requirements scale quadratically with the number of patches, hindering efficient concurrent processing. In contrast, Swin Transformers efficiently capture local features but are typically restricted to single-view processing due to their reliance on regular-grid input constraints. To overcome these limitations, we propose a hybrid fusion framework that decomposes multi-view information integration into iterative intra-view and inter-view interactions across multiple resolutions. The framework preserves spatial coherence and enables fine-grained feature integration while maintaining computational efficiency. Specifically, the inter-view feature exchange module, based on the Vision Transformer, employs bridge tokens to summarize information from localized patch windows, reducing memory usage while preserving spatial relationships across views. The intra-view feature extraction module, built on the Swin Transformer, facilitates dynamic, attention-driven interactions among image patches and bridge tokens within each window. Moreover, shared positional embeddings are explicitly incorporated to enhance spatial correspondence across views. Extensive experiments on a public dataset demonstrate the superiority of our method in csPCa classification. Ablation studies highlight contributions of different components, while attention map visualizations validate integration of anatomical structures across views.

Details DOI

AAAI Conference 2026 Conference Paper

Thinker: Training LLMs in Hierarchical Thinking for Deep Search via Multi-Turn Interaction

Jun Xu
Xinkai Du
Yu Ao
Peilong Zhao
Yang Li
Ling Zhong
Lin Yuan
Zhongpu Bo

Efficient retrieval of external knowledge bases and web pages is crucial for enhancing the reasoning abilities of LLMs. Previous works on training LLMs to leverage external retrievers for solving complex problems have predominantly employed end-to-end reinforcement learning. However, these approaches neglect supervision over the reasoning process, making it difficult to guarantee logical coherence and rigor. To address these limitations, we propose Thinker, a hierarchical thinking model for deep search through multi-turn interaction, making the reasoning process supervisable and verifiable. It decomposes complex problems into independently solvable sub-problems, each dually represented in both natural language and an equivalent logical function to support knowledge base and web searches. Concurrently, dependencies between sub-problems are passed as parameters via these logical functions, enhancing the logical coherence of the problem-solving process. To avoid unnecessary external searches, we perform knowledge boundary determination to check if a sub-problem is within the LLM's intrinsic knowledge, allowing it to answer directly. Experimental results indicate that with as few as several hundred training samples, the performance of Thinker is competitive with established baselines. Furthermore, when scaled to the full training set, Thinker significantly outperforms these methods across various datasets and model sizes.

PDF Details DOI

ECAI Conference 2025 Conference Paper

A Self-Adaptive Frequency Domain Network for Continuous Intraoperative Hypotension Prediction

Xian Zeng
Tianze Xu
Kai Yang
Jie Sun
Youran Wang
Jun Xu
Mucheng Ren

Intraoperative hypotension (IOH) is strongly associated with postoperative complications, including postoperative delirium and increased mortality, making its early prediction crucial in perioperative care. While several artificial intelligence-based models have been developed to provide IOH warnings, existing methods face limitations in incorporating both time and frequency domain information, capturing short- and long-term dependencies, and handling noise sensitivity in biosignal data. To address these challenges, we propose a novel Self-Adaptive Frequency Domain Network (SAFDNet). Specifically, SAFDNet integrates an adaptive spectral block, which leverages Fourier analysis to extract frequency-domain features and employs self-adaptive thresholding to mitigate noise. Additionally, an interactive attention block is introduced to capture both long-term and short-term dependencies in the data. Extensive internal and external validations on two large-scale real-world datasets demonstrate that SAFDNet achieves up to 97. 3% AUROC in IOH early warning, outperforming state-of-the-art models. Furthermore, SAFDNet exhibits robust predictive performance and low sensitivity to noise, making it well-suited for practical clinical applications.

Details

AAAI Conference 2025 Conference Paper

AdaO2B: Adaptive Online to Batch Conversion for Out-of-Distribution Generalization

Xiao Zhang
Sunhao Dai
Jun Xu
Yong Liu
Zhenhua Dong

Online to batch conversion involves constructing a new batch learner by utilizing a series of models generated by an existing online learning algorithm, for achieving generalization guarantees under i.i.d assumption. However, when applied to real-world streaming applications such as streaming recommender systems, the data stream may be sampled from time-varying distributions instead of persistently being i.i.d. This poses a challenge in terms of out-of-distribution (OOD) generalization. Existing approaches employ fixed conversion mechanisms that are unable to adapt to novel testing distributions, hindering the testing accuracy of the batch learner. To address these issues, we propose AdaO2B, an adaptive online to batch conversion approach under the bandit setting. AdaO2B is designed to be aware of the distribution shifts in the testing data and achieves OOD generalization guarantees. Specifically, AdaO2B can dynamically combine the sequence of models learned by a contextual bandit algorithm and determine appropriate combination weights using a context-aware weighting function. This innovative approach allows for the conversion of a sequence of models into a batch learner that facilitates OOD generalization. Theoretical analysis provides justification for why and how the learned adaptive batch learner can achieve OOD generalization error guarantees. Experimental results have demonstrated that AdaO2B significantly outperforms state-of-the-art baselines on both synthetic and real-world recommendation datasets.

PDF Details DOI

EAAI Journal 2025 Journal Article

Fault diagnosis of motor bearing in complex scenarios based on Mamba and Indicative Contrastive Learning

Jun Xu
Yunji Zhao
Wenming Bao
Chao Hao

Details DOI

NeurIPS Conference 2025 Conference Paper

GUI-G1: Understanding R1-Zero-Like Training for Visual Grounding in GUI Agents

Yuqi Zhou
Sunhao Dai
Shuai Wang
Kaiwen Zhou
Qinglin Jia
Jun Xu

Recent Graphical User Interface (GUI) agents replicate the R1-Zero paradigm, coupling online Reinforcement Learning (RL) with explicit chain-of-thought reasoning prior to object grounding and thereby achieving substantial performance gains. In this paper, we first conduct extensive analysis experiments of three key components of that training pipeline: input design, output evaluation, and policy update—each revealing distinct challenges arising from blindly applying general-purpose RL without adapting to GUI grounding tasks. Input design: Current templates encourage the model to generate chain-of-thought reasoning, but longer chains unexpectedly lead to worse grounding performance. Output evaluation: Reward functions based on hit signals or box area allow models to exploit box size, leading to reward hacking and poor localization quality. Policy update: Online RL tends to overfit easy examples due to biases in length and sample difficulty, leading to under-optimization on harder cases. To address these issues, we propose three targeted solutions. First, we adopt a $\textbf{Fast Thinking Template}$ that encourages direct answer generation, reducing excessive reasoning during training. Second, we incorporate a box size constraint into the reward function to mitigate reward hacking. Third, we revise the RL objective by adjusting length normalization and adding a difficulty-aware scaling factor, enabling better optimization on hard samples. Our $\textbf{GUI-G1-3B}$, trained on 17K public samples with Qwen2. 5-VL-3B-Instruct, achieves $\textbf{90. 3\%}$ accuracy on ScreenSpot and $\textbf{37. 1\%}$ on ScreenSpot-Pro. This surpasses all prior models of similar size and even outperforms the larger UI-TARS-7B, establishing a new state-of-the-art in GUI agent grounding.