Author name cluster

Bin Fan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

1 author row

AAAI Conference 2026 Conference Paper

EC-MVSNet: Enhanced Cascaded Multi-View Stereo with Cross-Scale Relevance Integration

Shaoqian Wang
Jiadai Sun
Bin Fan
Qiang Wang
Bin Lu
Yuchao Dai

Cascade-based multi-scale architectures are currently the mainstream in Multi-view Stereo (MVS), achieving a balance between computational efficiency and reconstruction accuracy. However, existing cascade MVS methods suffer from significant limitations in cross-scale information utilization, where depth estimation processes operate independently across scales without fully exploiting the rich relevance between adjacent scales. To address this fundamental limitation, we propose an Enhanced Cascade Multi-View Stereo framework (EC-MVSNet), which introduces a novel cross-scale relevance integration strategy. Specifically, we introduce a Cross-Scale Feature-based Joint Construction (CFC) module to synergistically combine features from adjacent scales to build more reliable cost volumes. Additionally, a Cross-Scale Probability-guided Enhancement (CPE) module is proposed to propagate depth probability distributions across scales to guide cost volume enhancement. Furthermore, we propose a Monocular Feature-based Refinement (MFR) module to further enhance depth prediction accuracy by leveraging monocular priors. Extensive experiments demonstrate that EC-MVSNet achieves state-of-the-art performance on multiple benchmarks, validating the effectiveness of the cross-scale integration in improving MVS reconstruction quality.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Group Orthogonal Low-Rank Adaptation for RGB-T Tracking

Zekai Shao
Yufan Hu
Jingyuan Liu
Bin Fan
Hongmin Liu

Parameter-efficient fine-tuning has emerged as a promising paradigm in RGB-T tracking, enabling downstream task adaptation by freezing pretrained parameters and fine-tuning only a small set of parameters. This set forms a rank space made up of multiple individual ranks, whose expressiveness directly shapes the model's adaptability. However, quantitative analysis reveals low-rank adaptation exhibits significant redundancy in the rank space, with many ranks contributing almost no practical information. This hinders the model's ability to learn more diverse knowledge to address the various challenges in RGB-T tracking. To address this issue, we propose the Group Orthogonal Low-Rank Adaptation (GOLA) framework for RGB-T tracking, which effectively leverages the rank space through structured parameter learning. Specifically, we adopt a rank decomposition partitioning strategy utilizing singular value decomposition to quantify rank importance, freeze crucial ranks to preserve the pretrained priors, and cluster the redundant ranks into groups to prepare for subsequent orthogonal constraints. We further design an inter-group orthogonal constraint strategy. This constraint enforces orthogonality between rank groups, compelling them to learn complementary features that target diverse challenges, thereby alleviating information redundancy. Experimental results demonstrate that GOLA effectively reduces parameter redundancy and enhances feature representation capabilities, significantly outperforming state-of-the-art methods across four benchmark datasets and validating its effectiveness in RGB-T tracking tasks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Learning Spatial Decay for Vision Transformers

Yuxin Mao
Zhen Qin
Jinxing Zhou
Bin Fan
Jing Zhang
Yiran Zhong
Yuchao Dai

Vision Transformers (ViTs) have revolutionized computer vision, yet their self-attention mechanism lacks explicit spatial inductive biases, leading to suboptimal performance on spatially-structured tasks. Existing approaches introduce data-independent spatial decay based on fixed distance metrics, applying uniform attention weighting regardless of image content and limiting adaptability to diverse visual scenarios. Inspired by recent advances in large language models where content-aware gating mechanisms (e.g., GLA, HGRN2, FOX) significantly outperform static alternatives, we present the first successful adaptation of data-dependent spatial decay to 2D vision transformers. We introduce Spatial Decay Transformer (SDT), featuring a novel Context-Aware Gating (CAG) mechanism that generates dynamic, data-dependent decay for patch interactions. Our approach learns to modulate spatial attention based on both content relevance and spatial proximity. We address the fundamental challenge of 1D-to-2D adaptation through a unified spatial-content fusion framework that integrates manhattan distance-based spatial priors with learned content representations. Extensive experiments on ImageNet-1K classification and generation tasks demonstrate consistent improvements over strong baselines. Our work establishes data-dependent spatial decay as a new paradigm for enhancing spatial attention in vision transformers.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Towards Accurate 3D Object Detection in Adverse Weather by Leveraging 4D Radar for LiDAR Geometry Enhancement

Tianxu Tong
Xinrun Liu
Hongmin Liu
Bin Fan

3D object detection is a critical component of autonomous driving, yet its performance degrades severely in adverse weather due to the degradation of LiDAR point clouds. While existing LiDAR-4D radar fusion methods enhance robustness by incorporating weather-robust 4D radar data, they often depend on well geometric structures from LiDAR and so struggle to effectively exploit radar data in case of degraded LiDAR data. To tackle this challenge, we propose REL, a novel 4D radar-guided LiDAR geometric enhancement framework. It utilizes 4D radar features to dynamically generate virtual LiDAR points, effectively increasing the density of degraded LiDAR data. Moreover, a Position-Guided Cross Attention (PGCA) module is proposed to enhance the feature representation of virtual points, while an Adaptive Feature Fusion (AFF) module is designed to integrate virtual and real LiDAR features. Extensive experiments on the K-Radar and Vod-Fog datasets demonstrate that REL achieves state-of-the-art 3D object detection performance under diverse adverse weather conditions. Notably, REL improves the overall AP3D by 9.3% on K-Radar and boosts the cyclist class by up to 52.9% 3D mAP under the most severe foggy condition on Vod-Fog.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Defying Imbalanced Forgetting in Class Incremental Learning

Shixiong Xu
Gaofeng Meng
Xing Nie
Bolin Ni
Bin Fan
Shiming Xiang

We observe a high level of imbalance in the accuracy of different learned classes in the same old task for the first time. This intriguing phenomenon, discovered in replay-based Class Incremental Learning (CIL), highlights the imbalanced forgetting of learned classes, as their accuracy is similar before the occurrence of catastrophic forgetting. This discovery remains previously unidentified due to the reliance on average incremental accuracy as the measurement for CIL, which assumes that the accuracy of classes within the same task is similar. However, this assumption is invalid in the face of catastrophic forgetting. Further empirical studies indicate that this imbalanced forgetting is caused by conflicts in representation between semantically similar old and new classes. These conflicts are rooted in the data imbalance present in replay-based CIL methods. Building on these insights, we propose CLass-Aware Disentanglement (CLAD) as a means to predict the old classes that are more likely to be forgotten and enhance their accuracy. Importantly, CLAD can be seamlessly integrated into existing CIL methods. Extensive experiments demonstrate that CLAD consistently improves current replay-based methods, resulting in performance gains of up to 2.56%.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Spatio-Temporal Interactive Learning for Efficient Image Reconstruction of Spiking Cameras

Bin Fan
Jiaoyang Yin
Yuchao Dai
Chao Xu
Tiejun Huang
Boxin Shi

The spiking camera is an emerging neuromorphic vision sensor that records high-speed motion scenes by asynchronously firing continuous binary spike streams. Prevailing image reconstruction methods, generating intermediate frames from these spike streams, often rely on complex step-by-step network architectures that overlook the intrinsic collaboration of spatio-temporal complementary information. In this paper, we propose an efficient spatio-temporal interactive reconstruction network to jointly perform inter-frame feature alignment and intra-frame feature filtering in a coarse-to-fine manner. Specifically, it starts by extracting hierarchical features from a concise hybrid spike representation, then refines the motion fields and target frames scale-by-scale, ultimately obtaining a full-resolution output. Meanwhile, we introduce a symmetric interactive attention block and a multi-motion field estimation block to further enhance the interaction capability of the overall network. Experiments on synthetic and real-captured data show that our approach exhibits excellent performance while maintaining low model complexity.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Zero-Shot Event-Intensity Asymmetric Stereo via Visual Prompting from Image Domain

Hanyue Lou
Jinxiu Liang
Minggui Teng
Bin Fan
Yong Xu
Boxin Shi

Event-intensity asymmetric stereo systems have emerged as a promising approach for robust 3D perception in dynamic and challenging environments by integrating event cameras with frame-based sensors in different views. However, existing methods often suffer from overfitting and poor generalization due to limited dataset sizes and lack of scene diversity in the event domain. To address these issues, we propose a zero-shot framework that utilizes monocular depth estimation and stereo matching models pretrained on diverse image datasets. Our approach introduces a visual prompting technique to align the representations of frames and events, allowing the use of off-the-shelf stereo models without additional training. Furthermore, we introduce a monocular cue-guided disparity refinement module to improve robustness across static and dynamic regions by incorporating monocular depth information from foundation models. Extensive experiments on real-world datasets demonstrate the superior zero-shot evaluation performance and enhanced generalization ability of our method compared to existing approaches.

PDF Details DOI

AAAI Conference 2022 Conference Paper

MTLDesc: Looking Wider to Describe Better

Changwei Wang
Rongtao Xu
Yuyang Zhang
Shibiao Xu
Weiliang Meng
Bin Fan
Xiaopeng Zhang

Limited by the locality of convolutional neural networks, most existing local features description methods only learn local descriptors with local information and lack awareness of global and surrounding spatial context. In this work, we focus on making local descriptors “look wider to describe better” by learning local Descriptors with More Than just Local information (MTLDesc). Specifically, we resort to context augmentation and spatial attention mechanisms to make our MTLDesc obtain non-local awareness. First, Adaptive Global Context Augmented Module and Diverse Local Context Augmented Module are proposed to construct robust local descriptors with context information from global to local. Second, Consistent Attention Weighted Triplet Loss is designed to integrate spatial attention awareness into both optimization and matching stages of local descriptors learning. Third, Local Features Detection with Feature Pyramid is given to obtain more stable and accurate keypoints localization. With the above innovations, the performance of our MTLDesc significantly surpasses the prior state-of-the-art local descriptors on HPatches, Aachen Day-Night localization and In- Loc indoor localization benchmarks. Our code is available at https: //github. com/vignywang/MTLDesc.

PDF Details

EAAI Journal 2022 Journal Article

Robust self-supervised monocular visual odometry based on prediction-update pose estimation network

Haixin Xiu
Yiyou Liang
Hui Zeng
Qing Li
Hongmin Liu
Bin Fan
Chen Li

Details DOI

AAAI Conference 2015 Conference Paper

10,000+ Times Accelerated Robust Subset Selection

Feiyun Zhu
Bin Fan
Xinliang Zhu
Ying Wang
Shiming Xiang
Chunhong Pan

Subset selection from massive data with noised information is increasingly popular for various applications. This problem is still highly challenging as current methods are generally slow in speed and sensitive to outliers. To address the above two issues, we propose an accelerated robust subset selection (ARSS) method. Speciﬁcally in the subset selection area, this is the ﬁrst attempt to employ the p (0 < p ≤ 1)-norm based measure for the representation loss, preventing large errors from dominating our objective. As a result, the robustness against outlier elements is greatly enhanced. Actually, data size is generally much larger than feature length, i. e. N L. Based on this observation, we propose a speedup solver (via ALM and equivalent derivations) to highly reduce the computational cost, theoretically from O N4 to O N2 L. Extensive experiments on ten benchmark datasets verify that our method not only outperforms state of the art methods, but also runs 10, 000+ times faster than the most related method.

PDF Details