Author name cluster

Ning Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

33 papers

2 author rows

AAAI Conference 2026 Conference Paper

LoGoSeg: Integrating Local and Global Features for Open-Vocabulary Semantic Segmentation

Junyang Chen
Xiangbo Lv
Zhiqiang Kou
Xingdong Sheng
Ning Xu
Yiguo Qiao

Open-vocabulary semantic segmentation (OVSS) extends traditional closed-set segmentation by enabling pixel-wise annotation for both seen and unseen categories using arbitrary textual descriptions. While existing methods leverage vision-language models (VLMs) like CLIP, their reliance on image-level pretraining often results in imprecise spatial alignment, leading to mismatched segmentations in ambiguous or cluttered scenes. However, most existing approaches lack strong object priors and region-level constraints, which can lead to object hallucination or missed detections, further degrading performance. To address these challenges, we propose LoGoSeg, an efficient single-stage framework that integrates three key innovations: (i) an object existence prior that dynamically weights relevant categories through global image-text similarity, effectively reducing hallucinations; (ii) a region-aware alignment module that establishes precise region-level visual-textual correspondences; and (iii) a dual-stream fusion mechanism that optimally combines local structural information with global semantic context. Unlike prior works, LoGoSeg eliminates the need for external mask proposals, additional backbones, or extra datasets, ensuring efficiency. Extensive experiments on six benchmarks (A-847, PC-459, A-150, PC-59, PAS-20, and PAS-20b) demonstrate its competitive performance and strong generalization in open-vocabulary settings.

PDF Details DOI

JBHI Journal 2026 Journal Article

SpineVLM: A Markdown-Guided Structured Fine-Tuning Framework for Spine X-ray Report Generation

Dong Liu
Wenhui Li
Ning Xu
Guoge Han
Rui Hao
Xianzhu Liu
An-An Liu

Automated medical report generation in specialized fields like spine radiography is constrained by data scarcity and high annotation costs. Consequently, existing multimodal large language models (MLLMs) struggle in these settings, often missing minute, scattered spinal abnormalities. We introduce SpineVLM, a data-efficient framework for structured spine X-ray report generation. The framework is built upon the newly constructed SXRG dataset, comprising 10, 468 image-report pairs developed via a hierarchical AI-assisted annotation pipeline. To optimize learning under limited data, we propose Markdown-Guided Structured Learning (MGSL), which reformulates unconstrained free-text synthesis into a structured completion task, acting as a strong regularizer. Furthermore, an unsupervised Region-Focused Inference (RFI) module powered by foundation models (DINOv2) isolates the vertebral column to enhance the perception of subtle lesions without requiring manual spatial annotations. Evaluated on a 7B-parameter vision-language backbone, SpineVLM achieves strong performance against ten baseline multimodal models across standard linguistic metrics. In a double-blind reader study, the system achieved a diagnostic F1-score of 0. 866, comparable to specialist performance, while reducing clinical reporting time by over 41%. By open-sourcing the dataset and codebase, we provide, to our knowledge, the first quantitative benchmark for automated spine radiography report generation, together with a structured framework for this data-limited setting. All data and code will be publicly released at https://github.com/LiuDongDaniel/SpineVLM.

Details DOI

JBHI Journal 2025 Journal Article

Accurate Multi-Landmark Localization in 3D Ultra-High Resolution CT Images of the Ears Via Deep Reinforcement Learning and Transformer

Zhiwei Qu
Li Zhuo
Ning Xu
Hongxia Yin
Zhenchang Wang
Xiaoguang Li

Automated landmark localization can help radiologists quickly determine the locations of key structures or lesion areas from medical images. However, when facing large-volume 3D medical images, existing methods have very high computational complexity due to the need to encode the global image. That is to say, it is difficult for existing methods to achieve accurate landmark localization in 3D medical images at a faster localization speed. In this paper, an accurate multi-landmark localization method for ear 3D Ultra-High Resolution CT (U-HRCT) images is proposed. This method adopts a novel localization pipeline that combines Deep Reinforcement Learning (DRL) and Transformer. Firstly, the DRL algorithm is used to quickly collect landmark-related local features. Secondly, Transformer is used to extract the spatial position relationship between anatomical structures from these discrete local features to infer the coordinate position of the landmark. Because the complex process of encoding the global image is avoided, the proposed method can achieve fast localization of ear multi-landmark in 3D U-HRCT images. Finally, we proposed a refinement module based on dual-branch hybrid Multi-Layer Perceptron, which can use the fast localization results of multi-landmark to learn the spatial position relationship between landmarks, thereby further improving the accuracy and stability of landmark localization. Experimental results on the self-built ear 3D U-HRCT dataset and the publicly available 2D cephalometric dataset demonstrate that, the proposed method can achieve Successful Detection Rate of 96. 71% and 89. 97% respectively within the precision range of 2. 0 mm, surpassing the state-of-the-art multi-landmark localization methods and has a faster localization speed.

Details DOI

NeurIPS Conference 2025 Conference Paper

Bi-Level Knowledge Transfer for Multi-Task Multi-Agent Reinforcement Learning

Junkai Zhang
Jinmin He
Yifan Zhang
Yifan Zang
Ning Xu
Jian Cheng

Multi-Agent Reinforcement Learning (MARL) has achieved remarkable success in various real-world scenarios, but its high cost of online training makes it impractical to learn each task from scratch. To enable effective policy reuse, we consider the problem of zero-shot generalization from offline data across multiple tasks. While prior work focuses on transferring individual skills of agents, we argue that the effective policy transfer across tasks should also capture the team-level coordination knowledge. In this paper, we propose Bi-Level Knowledge Transfer (BiKT) for Multi-Task MARL, which performs knowledge transfer at both the individual and team levels. At the individual level, we extract transferable individual skill embeddings from offline MARL trajectories. At the team level, we define tactics as coordinated patterns of skill combinations and capture them by leveraging the learned skill embeddings. We map skill combinations into compact tactic embeddings and then construct a tactic codebook. To incorporate both skills and tactics into decision-making, we design a bi-level decision transformer that infers them in sequence. Our BiKT leverages both the generalizability of individual skills and the diversity of tactics, enabling the learned policy to perform effectively across multiple tasks. Extensive experiments on SMAC and MPE benchmarks demonstrate that BiKT achieves strong generalization to previously unseen tasks.