Author name cluster

Jianan Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

AAAI Conference 2026 Conference Paper

HyperCOD: The First Challenging Benchmark and Baseline for Hyperspectral Camouflaged Object Detection

Shuyan Bai
Tingfa Xu
Peifu Liu
Yuhao Qiu
Huiyan Bai
Huan Chen
Yanyan Peng
Jianan Li

RGB-based camouflaged object detection struggles in real-world scenarios where color and texture cues are ambiguous. While hyperspectral image offers a powerful alternative by capturing fine-grained spectral signatures, progress in hyperspectral camouflaged object detection (HCOD) has been critically hampered by the absence of a dedicated, large-scale benchmark. To spur innovation, we introduce HyperCOD, the first challenging benchmark for HCOD. Comprising 350 high-resolution hyperspectral images, It features complex real-world scenarios with minimal objects, intricate shapes, severe occlusions, and dynamic lighting to challenge current models.The advent of foundation models like the Segment Anything Model (SAM) presents a compelling opportunity. To adapt the Segment Anything Model (SAM) for HCOD, we propose HyperSpectral Camouflage-aware SAM (HSC-SAM). HSC-SAM ingeniously reformulates the hyperspectral image by decoupling it into a spatial map fed to SAM's image encoder and a spectral saliency map that serves as an adaptive prompt. This translation effectively bridges the modality gap. Extensive experiments show that HSC-SAM sets a new state-of-the-art on HyperCOD and generalizes robustly to other public HSI datasets. The HyperCOD dataset and our HSC-SAM baseline provide a robust foundation to foster future research in this emerging area.

PDF Details DOI

AAAI Conference 2026 Conference Paper

MODA: The First Challenging Benchmark for Multispectral Object Detection in Aerial Images

Shuaihao Han
Tingfa Xu
Peifu Liu
Jianan Li

Aerial object detection faces significant challenges in real-world scenarios, such as small objects and extensive background interference, which limit the performance of RGB-based detectors with insufficient discriminative information. Multispectral images (MSIs) capture additional spectral cues across multiple bands, offering a promising alternative. However, the lack of training data has been the primary bottleneck to exploiting the potential of MSIs. To address this gap, we introduce the first large-scale dataset for Multispectral Object Detection in Aerial images (MODA), which comprises 14,041 MSIs and 330,191 annotations across diverse, challenging scenarios, providing a comprehensive data foundation for this field. Furthermore, to overcome challenges inherent to aerial object detection using MSIs, we propose OSSDet, a framework that integrates spectral and spatial information with object-aware cues. OSSDet employs a cascaded spectral-spatial modulation structure to optimize target perception, aggregates spectrally related features by exploiting spectral similarities to reinforce intra-object correlations, and suppresses irrelevant background via object-aware masking. Moreover, cross-spectral attention further refines object-related representations under explicit object-aware guidance. Extensive experiments demonstrate that OSSDet outperforms existing methods with comparable parameters and efficiency.

PDF Details DOI

IROS Conference 2025 Conference Paper

Cooperative Bearing-Only Target Pursuit via Multiagent Reinforcement Learning: Design and Experiment

Jianan Li
Zhikun Wang
Susheng Ding
Shiliang Guo
Shiyu Zhao 0002

This paper addresses the multi-robot pursuit problem for an unknown target, encompassing both target state estimation and pursuit control. First, in state estimation, we focus on using only bearing information, as it is readily available from vision sensors and effective for small, distant targets. Challenges such as instability due to the nonlinearity of bearing measurements and singularities in the two-angle representation are addressed through a proposed uniform bearing-only information filter. This filter integrates multiple 3D bearing measurements, provides a concise formulation, and enhances stability and resilience to target loss caused by limited field of view (FoV). Second, in target pursuit control within complex environments, where challenges such as heterogeneity and limited FoV arise, conventional methods like differential games or Voronoi partitioning often prove inadequate. To address these limitations, we propose a novel multiagent reinforcement learning (MARL) framework, enabling multiple heterogeneous vehicles to search, localize, and follow a target while effectively handling those challenges. Third, to bridge the sim-to-real gap, we propose two key techniques: incorporating adjustable low-level control gains in training to replicate the dynamics of real-world autonomous ground vehicles (AGVs), and proposing spectral-normalized RL algorithms to enhance policy smoothness and robustness. Finally, we demonstrate the successful zero-shot transfer of the MARL controllers to AGVs, validating the effectiveness and practical feasibility of our approach. The accompanying video is available at https://youtu.be/HO7FJyZiJ3E.

Details

AAAI Conference 2025 Conference Paper

FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

Yao Xiao
Tingfa Xu
Yu Xin
Jianan Li

Embedded flight devices with visual capabilities have become essential for a wide range of applications. In aerial image detection, while many existing methods have partially addressed the issue of small target detection, challenges remain in optimizing small target detection and balancing detection accuracy with efficiency. These issues are key obstacles to the advancement of real-time aerial image detection. In this paper, we propose a new family of real-time detectors for aerial image detection, named FBRT-YOLO, to address the imbalance between detection accuracy and efficiency. Our method comprises two lightweight modules: Feature Complementary Mapping Module (FCM) and Multi-Kernel Perception Unit (MKP), designed to enhance object perception for small targets in aerial images. FCM focuses on alleviating the problem of information imbalance caused by the loss of small target information in deep networks. It aims to integrate spatial positional information of targets more deeply into the network, better aligning with semantic information in the deeper layers to improve the localization of small targets. We introduce MKP, which leverages convolutions with kernels of different sizes to enhance the relationships between targets of various scales and improve the perception of targets at different scales. Extensive experimental results on three major aerial image datasets, including Visdrone, UAVDT, and AI-TOD, demonstrate that FBRT-YOLO outperforms various real-time detectors in terms of performance and speed.

PDF Details DOI

AAAI Conference 2025 Conference Paper

HSOD-BIT-V2: A Challenging Benchmark for Hyperspectral Salient Object Detection

Yuhao Qiu
Shuyan Bai
Tingfa Xu
Peifu Liu
Haolin Qin
Jianan Li

Salient Object Detection (SOD) is crucial in computer vision, yet RGB-based methods face limitations in challenging scenes, such as small objects and similar color features. Hyperspectral images provide a promising solution for more accurate Hyperspectral Salient Object Detection (HSOD) by abundant spectral information, while HSOD methods are hindered by the lack of extensive and available datasets. In this context, we introduce HSOD-BIT-V2, the largest and most challenging HSOD benchmark dataset to date. Five distinct challenges focusing on small objects and foreground-background similarity are designed to emphasize spectral advantages and real-world complexity. To tackle these challenges, we propose Hyper-HRNet, a high-resolution HSOD network. Hyper-HRNet effectively extracts, integrates, and preserves effective spectral information while reducing dimensionality by capturing the self-similar spectral features. Additionally, it conveys fine details and precisely locates object contours by incorporating comprehensive global information and detailed object saliency representations. Experimental analysis demonstrates that Hyper-HRNet outperforms existing models, especially in challenging scenarios.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Knowledge Starts with Practice: Knowledge-Aware Exercise Generative Recommendation with Adaptive Multi-Agent Cooperation

Yangtao Zhou
Hua Chu
Chen Chen
Ziwen Wang
Jiacheng Liu
Jianan Li
Yueying Feng
Xiangming Li

Adaptive learning, which requires the in-depth understanding of students' learning processes and rational planning of learning resources, plays a crucial role in intelligent education. However, how to effectively model these two processes and seamlessly integrate them poses significant implementation challenges for adaptive learning. As core learning resources, exercises have the potential to diagnose students' knowledge states during the learning processes and provide personalized learning recommendations to strengthen students' knowledge, thereby serving as a bridge to boost student-oriented adaptive learning. Therefore, we introduce a novel task called Knowledge-aware Exercise Generative Recommendation (KEGR). It aims to dynamically infer students' knowledge states from their past exercise responses and customizably generate new exercises. To achieve KEGR, we propose an adaptive multi-agent cooperation framework, called ExeGen, inspired by the excellent reasoning and generative capabilities of LLM-based AI agents. Specifically, ExeGen coordinates four specialized agents for supervision, knowledge state perception, exercise generation, and quality refinement through an adaptive loop workflow pipeline. More importantly, we devise two enhancement mechanisms in ExeGen: 1) A human-simulated knowledge perception mechanism mimics students' cognitive processes and generates interpretable knowledge state descriptions via demonstration-based In-Context Learning (ICL). In this mechanism, a dual-matching strategy is further designed to retrieve highly relevant demonstrations for reliable ICL reasoning. 2) An exercise generation-adversarial mechanism collaboratively refines exercise generation leveraging a group of quality evaluation expert agents via iterative adversarial feedback. Finally, a comprehensive evaluation protocol is carefully designed to assess ExeGen. Extensive experiments on real-world educational datasets and a practical deployment in college education demonstrate the effectiveness and superiority of ExeGen. The code is available at https: //github. com/dsz532/exeGen.

PDF Details

NeurIPS Conference 2025 Conference Paper

MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking

Tianhao Li
Tingfa Xu
Ying Wang
Haolin Qin
Xu Lin
Jianan Li

Drone-based multi-object tracking is essential yet highly challenging due to small targets, severe occlusions, and cluttered backgrounds. Existing RGB-based multi-object tracking algorithms heavily depend on spatial appearance cues such as color and texture, which often degrade in aerial views, compromising tracking reliability. Multispectral imagery, capturing pixel-level spectral reflectance, provides crucial spectral cues that significantly enhance object discriminability under degraded spatial conditions. However, the lack of dedicated multispectral UAV datasets has hindered progress in this domain. To bridge this gap, we introduce MMOT, the first challenging benchmark for drone-based multispectral multi-object tracking dataset. It features three key characteristics: (i) Large Scale — 125 video sequences with over 488. 8K annotations across eight object categories; (ii) Comprehensive Challenges — covering diverse real-world challenges such as extreme small targets, high-density scenarios, severe occlusions and complex platform motion; and (iii) Precise Oriented Annotations — enabling accurate localization and reduced object ambiguity under aerial perspectives. To better extract spectral features and leverage oriented annotations, we further present a multispectral and orientation-aware MOT scheme adapting existing MOT methods, featuring: (i) a lightweight Spectral 3D-Stem integrating spectral features while preserving compatibility with RGB pretraining; (ii) a orientation-aware Kalman filter for precise state estimation; and (iii) an end-to-end orientation-adaptive transformer architecture. Extensive experiments across representative trackers consistently show that multispectral input markedly improves tracking performance over RGB baselines, particularly for small and densely packed objects. We believe our work will benefit the community for advancing drone-based multispectral multi-object tracking research. Our MMOT, code and benchmarks are publicly available at https: //github. com/Annzstbl/MMOT.

PDF Details

NeurIPS Conference 2024 Conference Paper

Target-Guided Adversarial Point Cloud Transformer Towards Recognition Against Real-world Corruptions

Jie Wang
Tingfa Xu
Lihe Ding
Jianan Li

Achieving robust 3D perception in the face of corrupted data presents an challenging hurdle within 3D vision research. Contemporary transformer-based point cloud recognition models, albeit advanced, tend to overfit to specific patterns, consequently undermining their robustness against corruption. In this work, we introduce the Target-Guided Adversarial Point Cloud Transformer, termed APCT, a novel architecture designed to augment global structure capture through an adversarial feature erasing mechanism predicated on patterns discerned at each step during training. Specifically, APCT integrates an Adversarial Significance Identifier and a Target-guided Promptor. The Adversarial Significance Identifier, is tasked with discerning token significance by integrating global contextual analysis, utilizing a structural salience index algorithm alongside an auxiliary supervisory mechanism. The Target-guided Promptor, is responsible for accentuating the propensity for token discard within the self-attention mechanism, utilizing the value derived above, consequently directing the model attention towards alternative segments in subsequent stages. By iteratively applying this strategy in multiple steps during training, the network progressively identifies and integrates an expanded array of object-associated patterns. Extensive experiments demonstrate that our method achieves state-of-the-art results on multiple corruption benchmarks.

PDF Details DOI

IJCAI Conference 2024 Conference Paper

Unified Single-Stage Transformer Network for Efficient RGB-T Tracking

Jianqiang Xia
Dianxi Shi
Ke Song
Linna Song
Xiaolei Wang
Songchang Jin
Chenran Zhao
Yu Cheng

Most existing RGB-T tracking networks extract modality features in a separate manner, which lacks interaction and mutual guidance between modalities. This limits the network's ability to adapt to the diverse dual-modality appearances of targets and the dynamic relationships between the modalities. Additionally, the three-stage fusion tracking paradigm followed by these networks significantly restricts the tracking speed. To overcome these problems, we propose a unified single-stage Transformer RGB-T tracking network, namely USTrack, which unifies the above three stages into a single ViT (Vision Transformer) backbone through joint feature extraction, fusion and relation modeling. With this structure, the network can not only extract the fusion features of templates and search regions under the interaction of modalities, but also significantly improve tracking speed through the single-stage fusion tracking paradigm. Furthermore, we introduce a novel feature selection mechanism based on modality reliability to mitigate the influence of invalid modalities for final prediction. Extensive experiments on three mainstream RGB-T tracking benchmarks show that our method achieves the new state-of-the-art while achieving the fastest tracking speed of 84. 2FPS. Code is available at https: //github. com/xiajianqiang/USTrack.

PDF Details DOI

JBHI Journal 2023 Journal Article

Expert-Guided Knowledge Distillation for Semi-Supervised Vessel Segmentation

Ning Shen
Tingfa Xu
Shiqi Huang
Feng Mu
Jianan Li

In medical image analysis, blood vessel segmentation is of considerable clinical value for diagnosis and surgery. The predicaments of complex vascular structures obstruct the development of the field. Despite many algorithms have emerged to get off the tight corners, they rely excessively on careful annotations for tubular vessel extraction. A practical solution is to excavate the feature information distribution from unlabeled data. This work proposes a novel semi-supervised vessel segmentation framework, named EXP-Net, to navigate through finite annotations. Based on the training mechanism of the Mean Teacher model, we innovatively engage an expert network in EXP-Net to enhance knowledge distillation. The expert network comprises knowledge and connectivity enhancement modules, which are respectively in charge of modeling feature relationships from global and detailed perspectives. In particular, the knowledge enhancement module leverages the vision transformer to highlight the long-range dependencies among multi-level token components; the connectivity enhancement module maximizes the properties of topology and geometry by skeletonizing the vessel in a non-parametric manner. The key components are dedicated to the conditions of weak vessel connectivity and poor pixel contrast. Extensive evaluations show that our EXP-Net achieves state-of-the-art performance on subcutaneous vessel, retinal vessel, and coronary artery segmentations.

Details DOI

NeurIPS Conference 2022 Conference Paper

CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds

Haiyang Wang
Lihe Ding
Shaocong Dong
Shaoshuai Shi
Aoxue Li
Jianan Li
Zhenguo Li
Liwei Wang

We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D. Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels with the same semantic predictions, which considers semantic consistency and diverse locality abandoned in previous bottom-up approaches. Then, to recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module to directly aggregate fine-grained spatial information from backbone for further proposal refinement. It is memory-and-computation efficient and can better encode the geometry-specific features of each 3D proposal. Our model achieves state-of-the-art 3D detection performance with remarkable gains of +3. 6% on ScanNet V2 and +2. 6% on SUN RGB-D in term of mAP@0. 25. Code will be available at https: //github. com/Haiyang-W/CAGroup3D.

PDF Details

AAAI Conference 2022 Conference Paper

Delving into Sample Loss Curve to Embrace Noisy and Imbalanced Data

Shenwang Jiang
Jianan Li
Ying Wang
Bo Huang
Zhang Zhang
Tingfa Xu

Corrupted labels and class imbalance are commonly encountered in practically collected training data, which easily leads to over-fitting of deep neural networks (DNNs). Existing approaches alleviate these issues by adopting a sample re-weighting strategy, which is to re-weight sample by designing weighting function. However, it is only applicable for training data containing only either one type of data biases. In practice, however, biased samples with corrupted labels and of tailed classes commonly co-exist in training data. How to handle them simultaneously is a key but under-explored problem. In this paper, we find that these two types of biased samples, though have similar transient loss, have distinguishable trend and characteristics in loss curves, which could provide valuable priors for sample weight assignment. Motivated by this, we delve into the loss curves and propose a novel probe-and-allocate training strategy: In the probing stage, we train the network on the whole biased training data without intervention, and record the loss curve of each sample as an additional attribute; In the allocating stage, we feed the resulting attribute to a newly designed curve-perception network, named CurveNet, to learn to identify the bias type of each sample and assign proper weights through meta-learning adaptively. The training speed of meta learning also blocks its application. To solve it, we propose a method named skip layer meta optimization (SLMO) to accelerate training speed by skipping the bottom layers. Extensive synthetic and real experiments well validate the proposed method, which achieves state-of-the-art performance on multiple challenging benchmarks.

PDF Details

NeurIPS Conference 2022 Conference Paper

MsSVT: Mixed-scale Sparse Voxel Transformer for 3D Object Detection on Point Clouds

Shaocong Dong
Lihe Ding
Haiyang Wang
Tingfa Xu
Xinli Xu
Jie Wang
Ziyang Bian
Ying Wang

3D object detection from the LiDAR point cloud is fundamental to autonomous driving. Large-scale outdoor scenes usually feature significant variance in instance scales, thus requiring features rich in long-range and fine-grained information to support accurate detection. Recent detectors leverage the power of window-based transformers to model long-range dependencies but tend to blur out fine-grained details. To mitigate this gap, we present a novel Mixed-scale Sparse Voxel Transformer, named MsSVT, which can well capture both types of information simultaneously by the divide-and-conquer philosophy. Specifically, MsSVT explicitly divides attention heads into multiple groups, each in charge of attending to information within a particular range. All groups' output is merged to obtain the final mixed-scale features. Moreover, we provide a novel chessboard sampling strategy to reduce the computational complexity of applying a window-based transformer in 3D voxel space. To improve efficiency, we also implement the voxel sampling and gathering operations sparsely with a hash map. Endowed by the powerful capability and high efficiency of modeling mixed-scale information, our single-stage detector built on top of MsSVT surprisingly outperforms state-of-the-art two-stage detectors on Waymo. Our project page: https: //github. com/dscdyc/MsSVT.

PDF Details

JBHI Journal 2018 Journal Article

Automatic Side Branch Ostium Detection and Main Vascular Segmentation in Intravascular Optical Coherence Tomography Images

Yihui Cao
Qinhua Jin
Yundai Chen
Qinye Yin
Xianjing Qin
Jianan Li
Rui Zhu
Wei Zhao

Intravascular optical coherence tomography is the state-of-the-art imaging modality in percutaneous coronary interventionplanning and evaluation, in which side branch ostium and main vascular measurements play critical roles. However, manual measurement is time consuming and labor intensive. In this paper, we propose a fully automatic method for side branch ostium detection and main vascular segmentation to make up manual deficiency. In our method, side branch ostium points are first detected and subsequently used to divide the lumen contour into side branch and main vascular regions. Based on the division, main vascular contour is then smoothly fitted for segmentation. In side branch ostium detection, our algorithm creatively converts the definition of curvature into the calculation of the signed included angles in global view, and originally applies a differential filter to highlight the feature of side branch ostium points. A total of 4618 images from 22 pullback runs were used to evaluate the performance of the presented method. The validation results of side branch detection were TPR = 82. 8%, TNR = 98. 7%, PPV = 86. 8%, NPV = 98. 7%. The average ostial distance error (ODE) was 0. 22 mm, and the DSC of main vascular segmentation was 0. 96. In conclusion, the qualitative and quantitative evaluation indicated that the presented method is effective and accurate.

Details DOI

NeurIPS Conference 2017 Conference Paper

Dual Path Networks

Yunpeng Chen
Jianan Li
Huaxin Xiao
Xiaojie Jin
Shuicheng Yan
Jiashi Feng

In this work, we present a simple, highly efficient and modularized Dual Path Network (DPN) for image classification which presents a new topology of connection paths internally. By revealing the equivalence of the state-of-the-art Residual Network (ResNet) and Densely Convolutional Network (DenseNet) within the HORNN framework, we find that ResNet enables feature re-usage while DenseNet enables new features exploration which are both important for learning good representations. To enjoy the benefits from both path topologies, our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures. Extensive experiments on three benchmark datasets, ImagNet-1k, Places365 and PASCAL VOC, clearly demonstrate superior performance of the proposed DPN over state-of-the-arts. In particular, on the ImagNet-1k dataset, a shallow DPN surpasses the best ResNeXt-101(64x4d) with 26% smaller model size, 25% less computational cost and 8% lower memory consumption, and a deeper DPN (DPN-131) further pushes the state-of-the-art single model performance with about 2 times faster training speed. Experiments on the Places365 large-scale scene dataset, PASCAL VOC detection dataset, and PASCAL VOC segmentation dataset also demonstrate its consistently better performance than DenseNet, ResNet and the latest ResNeXt model over various applications.

PDF Details

TCS Journal 1994 Journal Article

Fair Petri nets and structural induction for rings of processes

Jianan Li
Ichiro Suzuki
Masafumi Yamashita

Details DOI