Author name cluster

Wenxi Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

AAAI Conference 2026 Conference Paper

2D-CrossScan Mamba: Enhancing State Space Models with Spatially Consistent Multi-Path 2D Information Propagation

Longlong Yu
Wenxi Li
Yaoqi Sun
Hang Xu
Chenggang Yan
Yuchen Guo

Despite recent progress in adapting State Space Models such as Mamba to vision tasks, their intrinsic 1D scanning mechanism imposes limitations when applied to inherently 2D-structured data like images. Existing adaptations, including VMamba and 2DMamba, either suffer from inconsistency between scanning order and spatial locality or restrict inter-patch communication to singular paths, hindering effective information propagation. In this paper, we propose 2D-CrossScan, a novel 2D-compatible scan framework that enables spatially consistent, multi-path hidden state propagation by integrating modified state equations over two-dimensional neighborhoods. Furthermore, we mitigate redundant information accumulation due to overlapping paths via cross-directional subtraction. To fully align with the 2D spatial structure, we introduce a multi-directional scanning strategy that starts simultaneously from all four corners of the image, enabling diverse propagation paths and better feature integration. Our approach maintains efficiency, requiring only minimal architectural changes to existing Mamba variants. Experimental results demonstrate substantial improvements in multiple visual tasks, including object detection and semantic segmentation on PANDA and COCO datasets. Compared to baseline SSM-based methods, 2D-CrossScan consistently yields better spatial representations, as confirmed by extensive effective receptive field visualizations and attention analyses. These results highlight the importance of geometry-aware state propagation and validate 2D-CrossScan as a simple yet powerful extension to SSMs for vision.

PDF Details DOI

AAAI Conference 2026 Conference Paper

GigaMoE: Sparsity-Guided Mixture of Experts for Efficient Gigapixel Object Detection

Xiang Li
Wenxi Li
Yuetong Wang
Chenyang Lyu
Haozhe Lin
Guiguang Ding
Yuchen Guo

Object detection in High-Resolution Wide (HRW) shots, or gigapixel images, presents unique challenges due to extreme object sparsity and vast scale variations. State-of-the-art methods like SparseFormer have pioneered sparse processing by selectively focusing on important regions, yet they apply a uniform computational model to all selected regions, overlooking their intrinsic complexity differences. This leads to a suboptimal trade-off between performance and efficiency. In this paper, we introduce GigaMoE, a novel backbone architecture that pioneers adaptive computation for this domain by replacing the standard Feed-Forward Networks (FFNs) with a Mixture-of-Experts (MoE) module. Our architecture first employs a shared expert to provide a robust feature baseline for all selected regions. Upon this foundation, our core innovation---a novel Sparsity-Guided Routing mechanism---insightfully repurposes importance scores from the sparse backbone to provide a "computational bonus,'' dynamically engaging a variable number of specialized experts based on content complexity. The entire system is trained efficiently via a loss-free load-balancing technique, eliminating the need for cumbersome auxiliary losses. Extensive experiments show that GigaMoE sets a new state-of-the-art on the PANDA benchmark, improving detection accuracy by 1.1% over SparseFormer while simultaneously reducing the computational cost (FLOPs) by a remarkable 32.3%.

PDF Details DOI

AAAI Conference 2024 Conference Paper

GigaHumanDet: Exploring Full-Body Detection on Gigapixel-Level Images

Chenglong Liu
Haoran Wei
Jinze Yang
Jintao Liu
Wenxi Li
Yuchen Guo
Lu Fang

Performing person detection in super-high-resolution images has been a challenging task. For such a task, modern detectors, which usually encode a box using center and width/height, struggle with accuracy due to two factors: 1) Human characteristic: people come in various postures and the center with high freedom is difficult to capture robust visual pattern; 2) Image characteristic: due to vast scale diversity of input (gigapixel-level), distance regression (for width and height) is hard to pinpoint, especially for a person, with substantial scale, who is near the camera. To address these challenges, we propose GigaHumanDet, an innovative solution aimed at further enhancing detection accuracy for gigapixel-level images. GigaHumanDet employs the corner modeling method to avoid the potential issues of a high degree of freedom in center pinpointing. To better distinguish similar-looking persons and enforce instance consistency of corner pairs, an instance-guided learning approach is designed to capture discriminative individual semantics. Further, we devise reliable shape-aware bodyness equipped with a multi-precision strategy as the human corner matching guidance to be appropriately adapted to the single-view large scene. Experimental results on PANDA and STCrowd datasets show the superiority and strong applicability of our design. Notably, our model achieves 82.4% in term of AP, outperforming current state-of-the-arts by more than 10%.

PDF Details DOI

ECAI Conference 2024 Conference Paper

SaccadeMOT: Enhancing Object Detection and Tracking in Gigapixel Images via Scale-Aware Density Estimation

Wenxi Li
Ruxin Zhang
Haozhe Lin
Yuchen Guo
Chao Ma 0004
Xiaokang Yang 0001

The proliferation of gigapixel imaging has ushered in unprecedented challenges in object detection and tracking due to the intense computational demands. Previous deep learning approaches, often tailored for megapixel images, fall short in addressing the unique complexities presented by the gigapixel level. To bridge this gap, we introduce SaccadeMOT, a novel architecture designed for efficient gigapixel-level multi-object tracking. Based on our observations of density map regression in crowd counting and small object detection in object detection tasks, we propose a novel gigapixel detection paradigm that combines the strengths of both approaches. Firstly, the “saccade” stage swiftly identifies regions likely containing objects, followed by the “gaze” stage that refines the detection within these areas. This strategic region selection is complemented by a robust tracking mechanism that combines head and body tracking, enhancing accuracy in environments with potential occlusions. Validated on the PANDA dataset, SaccadeMOT not only demonstrates an 13× speed improvement over existing state-of-the-art tracker BotSORT but also exhibits promising applications in gigapixel-level pathology analysis, particularly in Whole Slide Imaging (WSI). This approach sets a new benchmark for handling super high-resolution images, offering significant advancements in both the speed and precision of object tracking technologies.

Details