Arrow Research search

Author name cluster

Feifei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers
2 author rows

Possible papers

5

EAAI Journal 2026 Journal Article

Meta-learning with variational inference for few-shot faults diagnosis of automotive transmission under variable operating conditions

  • Bin Sun
  • Hongkun Li
  • Nan Liu
  • Feifei Li
  • Zhenhui Ma

The automotive transmission is a critical component for regulating vehicle speed. However, in real industrial settings, the complexity and variability of operating conditions, along with a limited number of fault samples, make traditional deep learning methods inadequate for practical applications. To address these challenges, this paper presents a few-shot fault diagnosis method for automotive transmissions under variable conditions, based on Variational Agnostic Meta-Learning for Robust Inference (VAMPIRE). First, the vibration data collected from sensors is sliced and converted into two-dimensional grayscale images to create the dataset. Next, by integrating Bayesian theory with a meta-learning framework, we use variational inference to approximate the posterior distribution. This allows the learned meta-parameters to coherently explain the variability of the data, thereby enhancing the model's generalization ability across different operating conditions. Finally, this study utilized data from an industrial-grade gearbox test bench and real-road test data of an industrial truck gearbox to conduct comparative experiments under multiple variable working conditions, and compared the results with various methods. The experimental results show that regardless of the sample size or the complexity of working conditions, the proposed method performs excellently in terms of accuracy, stability, and generalizability. For example, in test scenarios involving multiple unknown working conditions, the proposed method achieved an average diagnostic accuracy of 96. 52 % for test bench data and 97. 54 % for real-vehicle data in 5-shot learning tasks. Even in the most challenging 1-shot learning tasks, its average accuracy remained at 93. 88 % and 94. 82 %, respectively, significantly outperforming the comparative methods.

AAAI Conference 2026 Conference Paper

SmartSight: Mitigating Hallucination in Video-LLMs Without Compromising Video Understanding via Temporal Attention Collapse

  • Yiming Sun
  • Mi Zhang
  • Feifei Li
  • Geng Hong
  • Min Yang

Despite Video Large Language Models (Video-LLMs) having rapidly advanced in recent years, perceptual hallucinations pose a substantial safety risk, which severely restricts their real-world applicability. While several methods for hallucination mitigation have been proposed, they often compromise the model’s capacity for video understanding and reasoning. In this work, we propose SmartSight, a pioneering step to address this issue in a training-free manner by leveraging the model’s own introspective capabilities. Specifically, SmartSight generates multiple candidate responses to uncover low-hallucinated outputs that are often obscured by standard greedy decoding. It assesses the hallucination of each response using the Temporal Attention Collapse score, which measures whether the model over-focuses on trivial temporal regions of the input video when generating the response. To improve efficiency, SmartSight identifies the Visual Attention Vanishing point, enabling more accurate hallucination estimation and early termination of hallucinated responses, leading to a substantial reduction in decoding cost. Experiments show that SmartSight substantially lowers hallucinations for QwenVL-2.5-7B by 10.59% on VRIPT-HAL, while simultaneously enhancing video understanding and reasoning, boosting performance on VideoMMMU by 8.86%. These results highlight SmartSight’s effectiveness in improving the reliability of open-source Video-LLMs.

ICML Conference 2025 Conference Paper

InfoCons: Identifying Interpretable Critical Concepts in Point Clouds via Information Theory

  • Feifei Li
  • Mi Zhang
  • Zhaoxiang Wang
  • Min Yang

Interpretability of point cloud (PC) models becomes imperative given their deployment in safety-critical scenarios such as autonomous vehicles. We focus on attributing PC model outputs to interpretable critical concepts, defined as meaningful subsets of the input point cloud. To enable human-understandable diagnostics of model failures, an ideal critical subset should be faithful (preserving points that causally influence predictions) and conceptually coherent (forming semantically meaningful structures that align with human perception). We propose InfoCons, an explanation framework that applies information-theoretic principles to decompose the point cloud into 3D concepts, enabling the examination of their causal effect on model predictions with learnable priors. We evaluate InfoCons on synthetic datasets for classification, comparing it qualitatively and quantitatively with four baselines. We further demonstrate its scalability and flexibility on two real-world datasets and in two applications that utilize critical scores of PC.

EAAI Journal 2024 Journal Article

A novel model for the pavement distress segmentation based on multi-level attention DeepLabV3+

  • Feifei Li
  • Yongli Mou
  • Zeyu Zhang
  • Quan Liu
  • Sabina Jeschke

The identification and recognition of pavement distress are vital for automatic pavement evaluation. The computation efficiency and accuracy are the two factors that determine the evaluation model. Our study utilizes the state-of-the-art image segmentation method of DeepLabV3+ with the attention mechanism. This study aims to conduct the pavement distress segmentation on the Crack500 and GAPs384 datasets. Critical results are comprehensively provided with a comparison of different backbones and architectures. In this study, an adaptive probabilistic sampling method is proposed and adopted to compare with the random crop and resized images. The test result of the adaptive prob-sampling method on DeepLabV3-attention architecture outperforms other model results on the Crack500 dataset. On the other way, the test result of the adaptive prob-sampling method on DeepLabV3 without attention architecture works better in the GAPs384 dataset. The different results relied on the diverse characteristics of datasets, since GAPs384 datasets have different asphalt surface types and a wide variety of distress classes rather than simple crack information which Crack500 included. In addition, some performance improvement methods, such as batch normalization of image input, revising dice loss, and hyperparameters search, have been implemented in this work. The results are solid and reliable for concluding the critical analysis of methods and datasets. This study demonstrates the considerable potential of deep learning in the application of intelligent pavement evaluation. To advance the practical application of artificial intelligence on distress segmentation, we will explore more domain adaptive methods in future studies.

ICRA Conference 2024 Conference Paper

Incremental 3D Reconstruction through a Hybrid Explicit-and-Implicit Representation

  • Feifei Li
  • Panwen Hu
  • Qi Song
  • Rui Huang 0001

3D reconstruction is an important task in computer vision and is widely used in robotics and autonomous driving. When building large-scale scenes, limitations in computing resources and the difficulty of accessing the entire dataset in a single task are inevitable. Therefore, an incremental reconstruction approach is desired. On the one hand, traditional explicit 3D reconstruction methods such as SLAM and SFM require global optimization, which means that time and space resources increase dramatically with the growth of training data. On the other hand, implicit methods like Neural Radiation Fields (NeRF) suffer from catastrophic forgetting if trained incrementally. In this paper, we incrementally reconstruct 3D models in a hybrid representation, where the density of the radiation field is formulated by a voxel grid, and the view-dependent color information of the points is inferred by a shallow MLP. The expansion of the voxel grid and the distillation of the shallow MLP are efficient in this case. Experimental results demonstrate that our incremental method achieves a level of accuracy on par with approaches employing global optimization techniques.