Author name cluster

Fei Yu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

24 papers

2 author rows

AAAI Conference 2026 Conference Paper

M3SR: Multi-Scale Multi-Perceptual Mamba for Efficient Spectral Reconstruction

Yuze Zhang
Lingjie Li
Qiuzhen Lin
Zhong Ming
Fei Yu
Victor C. M. Leung

The Mamba architecture has been widely applied to various low-level vision tasks due to its exceptional adaptability and strong performance. Although the Mamba architecture has been adopted for spectral reconstruction, it still faces the following two challenges: (1) Single spatial perception limits the ability to fully understand and analyze hyperspectral images; (2) Single-scale feature extraction struggles to capture the complex structures and fine details present in hyperspectral images. To address these issues, we propose a multi-scale, multi-perceptual Mamba architecture for the spectral reconstruction task, called M3SR. Specifically, we design a multi-perceptual fusion block to enhance the ability of the model to comprehensively understand and analyze the input features. By integrating the multi-perceptual fusion block into a U-Net structure, M3SR can effectively extract and fuse global, intermediate, and local features, thereby enabling accurate reconstruction of hyperspectral images at multiple scales. Extensive quantitative and qualitative experiments demonstrate that the proposed M3SR outperforms existing state-of-the-art methods while incurring a lower computational cost.

PDF Details DOI

IROS Conference 2025 Conference Paper

A Two-Stage Lightweight Framework for Efficient Land-Air Bimodal Robot Autonomous Navigation

Yongjie Li
Zhou Liu
Wenshuai Yu
Zhangji Lu
Chenyang Wang
Fei Yu
Qingquan Li

Land-air bimodal robots (LABR) are gaining attention for autonomous navigation, combining high mobility from aerial vehicles with long endurance from ground vehicles. However, existing LABR navigation methods are limited by suboptimal trajectories from mapping-based approaches and the excessive computational demands of learning-based methods. To address this, we propose a two-stage lightweight framework that integrates global key points prediction with local trajectory refinement to generate efficient and reachable trajectories. In the first stage, the Global Key points Prediction Network (GKPN) was used to generate a hybrid land-air keypoint path. The GKPN includes a Sobel Perception Network (SPN) for improved obstacle detection and a Lightweight Attention Planning Network (LAPN) to improves predictive ability by capturing contextual information. In the second stage, the global path is segmented based on predicted key points and refined using a mapping-based planner to create smooth, collision-free trajectories. Experiments conducted on our LABR platform show that our framework reduces network parameters by 14% and energy consumption during land-air transitions by 35% compared to existing approaches. The framework achieves real-time navigation without GPU acceleration and enables zero-shot transfer from simulation to reality during deployment.

Details

AAAI Conference 2025 Conference Paper

CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

Yuxuan Wang
Yijun Liu
Fei Yu
Chen Huang
Kexin Li
Zhiguo Wan
Wanxiang Che
Hongyang Chen

Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision-Language Understanding Evaluation (CVLUE) benchmark dataset, where the selection of object categories and images is entirely driven by Chinese native speakers, ensuring that the source images are representative of Chinese culture. The benchmark contains four distinct VL tasks ranging from image-text retrieval to visual question answering, visual grounding and visual dialogue. We present a detailed statistical analysis of CVLUE and provide a baseline performance analysis with several open-source multilingual VLMs on CVLUE and its English counterparts to reveal their performance gap between English and Chinese. Our in-depth category-level analysis reveals a lack of Chinese cultural knowledge in existing VLMs. We also find that fine-tuning on Chinese culture-related VL datasets effectively enhances VLMs' understanding of Chinese culture.

PDF Details DOI

EAAI Journal 2025 Journal Article

Global-local coupled learning method for autonomous underwater vehicle side-scan sonar image recognition

Fei Yu
Xiaodong Liu
Wei Liu
Jixin Liu

Details DOI

IROS Conference 2025 Conference Paper

JAM: Keypoint-Guided Joint Prediction after Classification-Aware Marginal Proposal for Multi-Agent Interaction

Fangze Lin
Ying He
Fei Yu
Hong Zhang

Predicting the future motion of road participants is a critical task in autonomous driving. In this work, we address the challenge of low-quality generation of low-probability modes in multi-agent joint prediction. To tackle this issue, we propose a two-stage multi-agent interactive prediction framework named keypoint-guided joint prediction after classification-aware marginal proposal (JAM). The first stage is modeled as a marginal prediction process, which classifies queries by trajectory type to encourage the model to learn all categories of trajectories, providing comprehensive mode information for the joint prediction module. The second stage is modeled as a joint prediction process, which takes the scene context and the marginal proposals from the first stage as inputs to learn the final joint distribution. We explicitly introduce key waypoints to guide the joint prediction module in better capturing and leveraging the critical information from the initial predicted trajectories. We conduct extensive experiments on the real-world Waymo Open Motion Dataset interactive prediction benchmark. The results show that our approach achieves competitive performance. In particular, in the framework comparison experiments, the proposed JAM outperforms other prediction frameworks and achieves state-of-the-art performance in interactive trajectory prediction. The code is available at https://github.com/LinFunster/JAM to facilitate future research.

Details

IROS Conference 2025 Conference Paper

MobiExo: GPS-SLAM Fusion for Seamless Indoor-Outdoor Mobile Manipulation with Hand-Foot Coordination

Jianpeng Wang
Zhen Tian
Wenlong Chen
Dian Yuan
Zhou Zhou
Ming Cen
Xia Hua
Fei Yu

Teleoperation systems for mobile robots face significant challenges in achieving seamless coordination across dynamic environments. We present MobiExo, a teleoperation system that unlocks seamless indoor-outdoor mobile manipulation. Our approach tackles two fundamental challenges: robust cross-environment localization and intuitive full-body control. A novel self-adaptive federated filter unifies GPS and SLAM, delivering continuous centimeter-level positioning (4. 5±0. 8 cm indoor, 6. 8±1. 2 cm outdoor) and eliminating transition errors. Simultaneously, an integrated hand-foot coordination framework translates the operator’s natural gait and gestures into fluid robot actions, maintaining remarkable millimeter-level end-effector precision (3. 5±0. 4 mm) during navigation. Extensive field trials validate our design, demonstrating high task success (96. 7% indoor, 94. 3% outdoor) and a 5. 9× efficiency improvement in multi-location tasks over stationary setups. Code is available at: https://github.com/wangjianpeng200/MobiExo.git

Details