Author name cluster

Huimin Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers

2 author rows

EAAI Journal 2026 Journal Article

A prediction method for micro-motor rotor unbalance based on the InceptionV3- Convolutional block attention module model

Haoyan Zhang
Yang An
Jiarong Fan
Huimin Wang
Dongxia Zheng
Shuai Wang
Guoqiang Wang

Rotor unbalance is a significant cause of motor failure in machinery, making accurate estimation of unbalance values essential. Due to their lightweight and compact design, micro-motors pose greater challenges in precise unbalance measurement. To address this issue, a rotor unbalance prediction model for micro-motors is proposed, which integrates data fusion and an attention mechanism. First, two sets of vibration signals are converted into images using the Gramian Angular Field (GAF) method and fused to construct the unbalance dataset. Data augmentation is then employed to enhance the model's generalization ability. Subsequently, an unbalance prediction model is developed based on the InceptionV3 network with a Convolutional Block Attention Module (CBAM), where the attention mechanism enhances feature extraction and transfer learning improves training efficiency and prediction accuracy. Finally, the improved model is combined with a probabilistic mapping approach to estimate the unbalance mass. Experimental results show that the proposed method achieves a prediction accuracy of 94% for both unbalance magnitude and phase. This approach not only improves the accuracy of rotor unbalance estimation in micro-motors but also provides a reference for unbalance detection in other types of machinery.

Details DOI

EAAI Journal 2026 Journal Article

Source-free cross-machine fault diagnosis method in a two-stage pseudo-supervised framework

Binbei He
Huimin Wang
Sha Fan

Domain adaptation plays a crucial role in cross-domain machinery fault diagnosis. Considering that the source domain data maybe not directly accessible due to privacy preservation and resource consumption, and it may differ from the machine source of target domain data, this paper proposes a source-free cross-machine fault diagnosis method. Firstly, a novel adaptive pseudo-labels refinement module is designed, which is composed of a semi-supervised clustered K-Nearest Neighbors (SSCKNN) algorithm and an adaptive threshold, to obtain high-quality and class-balanced pseudo-labels regardless of the high domain discrepancy caused by cross-machine. Furthermore, a two-stage training strategy is developed based on the pseudo-supervised contrastive learning to facilitate the knowledge transfer of target model and improve the training efficiency. Compared to the existing methods, the proposed approach outperforms in cross-machine fault diagnosis through a novel centroid definition and an adaptive threshold mechanism. In experiments, the proposed fault diagnosis method is compared with several state-of-the-art results on two bearing datasets and improves the accuracy by almost 40% in the cross-machine scenario, and 1 % − − 10 % in cross-domain scenario, proving its effectiveness and superiority.

Details DOI

ICRA Conference 2025 Conference Paper

Generalizing Motion Planners with Mixture of Experts for Autonomous Driving

Qiao Sun 0001
Huimin Wang
Jiahao Zhan
Fan Nie
Xin Wen
Leimeng Xu
Kun Zhan
Peng Jia 0007

Large real-world driving datasets have sparked significant research into various aspects of learning-based motion planners for autonomous driving. These include data augmentation, model architecture, reward design, training strategies, and planner pipelines. In this paper, we review and benchmark previous methods. Experiments show that many of these approaches have limited generalization abilities in planning performance due to overly complex designs or training paradigms. Experiments further reveal that as models are appropriately scaled, many designs become redundant. Therefore, we introduce StateTransformer-2 (STR2), a scalable, decoder-only motion planner. STR2uses a Vision Transformer (ViT) encoder and a mix-of-experts (MoE) causal transformer architecture. The MoE backbone addresses modality collapse and reward balancing by expert routing during training. Extensive experiments on the NuPlan dataset show that our method generalizes better than previous approaches across different test sets and closed-loop simulations. We evaluate its scalability on billions of real-world urban driving scenarios, demonstrating consistent accuracy improvements as both data and model size grow.

Details

NeurIPS Conference 2025 Conference Paper

Spik-NeRF: Spiking Neural Networks for Neural Radiance Fields

Gang Wan
Qinlong Lan
Zihan Li
Huimin Wang
Wu Yitian
wang zhen
Wanhua Li
Yufei Guo

Spiking Neural Networks (SNNs), as a biologically inspired neural network architecture, have garnered significant attention due to their exceptional energy efficiency and increasing potential for various applications. In this work, we extend the use of SNNs to neural rendering tasks and introduce Spik-NeRF (Spiking Neural Radiance Fields). We observe that the binary spike activation map of traditional SNNs lacks sufficient information capacity, leading to information loss and a subsequent decline in the performance of spiking neural rendering models. To address this limitation, we propose the use of ternary spike neurons, which enhance the information-carrying capacity in the spiking neural rendering model. With ternary spike neurons, Spik-NeRF achieves performance that is on par with, or nearly identical to, traditional ANN-based rendering models. Additionally, we present a re-parameterization technique for inference that allows Spik-NeRF with ternary spike neurons to retain the event-driven, multiplication-free advantages typical of binary spike neurons. Furthermore, to further boost the performance of Spik-NeRF, we employ a distillation method, using an ANN-based NeRF to guide the training of our Spik-NeRF model, which is more compatible with the ternary neurons compared to the standard binary neurons. We evaluate Spik-NeRF on both realistic and synthetic scenes, and the experimental results demonstrate that Spik-NeRF achieves rendering performance comparable to ANN-based NeRF models.

PDF Details

NeurIPS Conference 2024 Conference Paper

MedJourney: Benchmark and Evaluation of Large Language Models over Patient Clinical Journey

Xian Wu
Yutian Zhao
Yunyan Zhang
Jiageng Wu
Zhihong Zhu
Yingying Zhang
Yi Ouyang
Ziheng Zhang

Large language models (LLMs) have demonstrated remarkable capabilities in language understanding and generation, leading to their widespread adoption across various fields. Among these, the medical field is particularly well-suited for LLM applications, as many medical tasks can be enhanced by LLMs. Despite the existence of benchmarks for evaluating LLMs in medical question-answering and exams, there remains a notable gap in assessing LLMs' performance in supporting patients throughout their entire hospital visit journey in real-world clinical practice. In this paper, we address this gap by dividing a typical patient's clinical journey into four stages: planning, access, delivery and ongoing care. For each stage, we introduce multiple tasks and corresponding datasets, resulting in a comprehensive benchmark comprising 12 datasets, of which five are newly introduced, and seven are constructed from existing datasets. This proposed benchmark facilitates a thorough evaluation of LLMs' effectiveness across the entire patient journey, providing insights into their practical application in clinical settings. Additionally, we evaluate three categories of LLMs against this benchmark: 1) proprietary LLM services such as GPT-4; 2) public LLMs like QWen; and 3) specialized medical LLMs, like HuatuoGPT2. Through this extensive evaluation, we aim to provide a better understanding of LLMs' performance in the medical domain, ultimately contributing to their more effective deployment in healthcare settings.

PDF Details DOI

EAAI Journal 2023 Journal Article

Diverse features discovery transformer for pedestrian attribute recognition

Aihua Zheng
Huimin Wang
Jiaxiang Wang
Huaibo Huang
Ran He
Amir Hussain

Recently, Swin Transformer has been widely explored as a general backbone for computer vision, which helps to improve the performance of vision tasks due to the ability to establish associations for long-range dependencies of different spatial locations. By implementing the pedestrian attribute recognition with Swin Transformer, we observe that Swin Transformer tends to focus on a relatively small number of local regions within which attributes may be correlated with other attributes, which leads Swin Transformer to predict attributes in those neglected regions based on such correlation. In fact, discriminative information may exist within these neglected regions, which is crucial for attribute identification. To address this problem, we propose a novel diverse features discovery transformer (DFDT) which can find more attribute relationship regions for robust pedestrian attribute recognition. First, Swin Transformer is used as a feature extraction network to acquire attribute features with the long-distance association, which predicts the corresponding attribute information. Second, we propose a diverse features suppression module (DFSM) to obtain semantic features directly associated with attributes by suppressing the peak locations of the most discriminative features and randomly selected feature regions to spread the feature regions that Swin Transformer is interested in. Third, we plug the diverse features suppression module into different stages of Swin Transformer to learn detailed texture features to help recognition. In addition, we have divided the attribute features into multiple vertical feature regions to improve the focus on local attribute features. Experiments on three benchmark datasets validate the effectiveness of the proposed algorithm.

Details DOI

IJCAI Conference 2020 Conference Paper

Structured Probabilistic End-to-End Learning from Crowds

Zhijun Chen
Huimin Wang
Hailong Sun
Pengpeng Chen
Tao Han
Xudong Liu
Jie Yang

End-to-end learning from crowds has recently been introduced as an EM-free approach to training deep neural networks directly from noisy crowdsourced annotations. It models the relationship between true labels and annotations with a specific type of neural layer, termed as the crowd layer, which can be trained using pure backpropagation. Parameters of the crowd layer, however, can hardly be interpreted as annotator reliability, as compared with the more principled probabilistic approach. The lack of probabilistic interpretation further prevents extensions of the approach to account for important factors of annotation processes, e. g. , instance difficulty. This paper presents SpeeLFC, a structured probabilistic model that incorporates the constraints of probability axioms for parameters of the crowd layer, which allows to explicitly model annotator reliability while benefiting from the end-to-end training of neural networks. Moreover, we propose SpeeLFC-D, which further takes into account instance difficulty. Extensive validation on real-world datasets shows that our methods improve the state-of-the-art.

PDF Details DOI