Author name cluster

Peng Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

115 papers

2 author rows

EAAI Journal 2026 Journal Article

A highly deterministic defect detection method for high-resolution weld based on deep learning and entropy quantization theory

Liangliang Li
Peng Wang
Zhigang Lü
RuoHai Di
Mengyu Sun
Xueren Wang
Bin Wang

In the field of welding quality control, precise defect detection is crucial for ensuring structural safety and extending service life. Addressing the challenges in welding defect detection, this paper introduces a high-certainty defect detection model that integrates a time series model of weld feature information with entropy quantization theory. The model initially employs a weld localization model based on SCLT (Stack-CNN-LSTM-Transfer), which combines convolutional neural network (CNN) and long short-term memory network (LSTM) and utilizes transfer learning to process sequential data, thereby achieving high-precision localization of the weld area. Additionally, to overcome the limitations of existing datasets, a data augmentation method that dynamically adjusts the quality of recombined images is designed, enhancing the diversity of the datasets in terms of annotation pixel coverage and segmented area size distribution. Furthermore, this paper proposes a strategy of hybrid feature enhancement and multi-pool fusion coding. By designing multi-path feature fusion and multi-pool fusion modules, along with a cross-layer adaptive feature fusion decoding module, it achieves deep feature fusion and processing. Moreover, to address the insufficiency of deterministic output, a high-certainty dynamic kernel defect detection module is designed based on entropy quantization theory to enhance the certainty of defect detection outputs. Experimental results indicate that the model's localization accuracy at the upper and lower boundaries of the weld is 5. 5213 and 6. 1313, respectively, demonstrating superior localization capabilities. On the WFR (Welding feature reorganization) dataset, the model's DICE, Precision, Recall, and Jaccard reached 0. 9090, 0. 8646, 0. 9623, and 0. 8364, respectively, achieving the best detection accuracy compared to existing methods. Concurrently, significant performance improvements have been observed on the DAR (Dynamically adjustable recombination) dataset constructed in this paper. This work can effectively advance the development of welding defect detection technology and provide robust technical support for industrial automation quality control.

Details DOI

EAAI Journal 2026 Journal Article

A novel multi-modal attentional collaborative learning framework with semantic enhancement for audio–visual question answering

Jie Yang
Miao Ma
Peng Wang
Yutong Li
Zhao Pei
Chao Yao
Longjiang Guo

The Audio–Visual Question Answering (AVQA) task aims to extract audio and visual cues from videos for answering the questions. The popular two-stage method, such as Progressive Spatio-Temporal Perception Network (PSTP-Net), first locates key segments in the audio–visual scene based on the question and then identifies the most relevant audio–visual regions. While this reduces cue redundancy, it overlooks the complementary role of rich cues, which is crucial for a comprehensive understanding of audio–visual content. In this paper, we propose a novel framework to start from the question itself, guide the entire multi-modal collaborative learning process, and conduct audio–visual question answering. This method includes a semantically enhanced strategy using Multi-modal Large Language Models (MLLMs) applied as an engineering solution, and a multi-modal attentional collaborative learning process, which is the core algorithmic innovation. Extensive experiments on the Music Audio–Visual Question Answering dataset (MUSIC-AVQA) and Music Audio–Visual Question Answering dataset version 2 (MUSIC-AVQA v2) demonstrate the effectiveness of our method. Compared to the PSTP-Net, our method reduces the number of training parameters by 61. 23% and Floating-point Operations (FLOPs) by 60. 83%, while achieving 2. 61 percentage-point improvement in accuracy. This indicates that our method effectively captures and aligns rich audio–visual cues, significantly enhancing reasoning efficiency. Our code will be publicly available soon.

Details DOI

JBHI Journal 2026 Journal Article

APSevLM: Acute Pancreatitis Severity Language Model

Leqi Zheng
Jiajun Fang
Hongyi Chen
Naiqing Li
Yunyuan Huang
Qiulin Ge
Yang Gu
Tao Yu

Approximately one-fifth of patients with acute pancreatitis (AP) develop severe forms, which are associated with high mortality rates, making early prediction of severity crucial for effective patient management. In this study, we present APSevLM (Acute Pancreatitis Severity Language Model), a large language model (LLM)-based approach that integrates admission-time clinical data, imaging reports, and expert knowledge to predict AP severity at an early stage. Through a comprehensive evaluation using data from over five hundred patients, APSevLM outperforms traditional scoring systems (BISAP and MCTSI), conventional machine learning algorithms, and state-of-the-art deep learning models, achieving an AUC of 0. 857. Attention visualizations of the model explain complex mechanisms that dynamically weigh different information modalities based on case severity. Furthermore, a systematic feature importance analysis identifies key predictive factors, particularly hematological parameters and cardiac markers, offering valuable insights for clinical practice. Our study positions APSevLM as an accurate predictive model and highlights potential biomarkers for the early diagnosis of severe AP.

Details DOI

AAAI Conference 2026 Conference Paper

Balanced Knowledge Distillation for Large Language Models with Mix-of-Experts

Jiajun Liu
Yao He
Wenjun Ke
Peng Wang
Ziyu Shang
Guozheng Li
Zijie Xu

Mixture-of-Experts (MoE) architectures have recently become a more prevalent choice for large language models (LLMs) than dense architectures due to their superior performance. However, billions of parameters bring MoE LLMs a huge cost for deployment and inference. To address these issues, knowledge distillation (KD) has become a widely adopted technique to compress LLMs. Existing KD methods for LLMs can be divided into dense-to-dense and moe-to-dense distillation. Dense-to-dense distillation transfers knowledge between single dense LLMs, while moe-to-dense distillation attempts to transfer knowledge between the MoE LLMs and the dense LLMs. However, the architectural mismatch prevents the student from fully absorbing knowledge when distilling MoE LLMs. To address this limitation, we investigate a new distillation setting, moe-to-moe, which aims to fully leverage expert knowledge of teachers and enable the student to absorb it more effectively. Compared to dense-to-dense and moe-to-dense, moe-to-moe suffers from two imbalance issues. First, expert-coverage deficiency reflects an imbalanced knowledge transfer of teacher experts: traditional distillation utilizes only the few experts activated by the teacher router. Second, routing imbalance appears when the student routing distribution drifts from the teacher, which makes it difficult for students to learn how to distribute different experts. To overcome these issues, we propose a novel distillation framework for moe-to-moe, Balanced Distillation (B-Distill), which equally spreads teacher expertise across student experts while regularizing the student router toward teacher-consistent balance. First, to mitigate expert-coverage deficiency, we introduce Monte Carlo exploration, which stochastically perturbs router probabilities so every teacher and student expert is sampled without enlarging the search space. Second, to correct routing imbalance and avert load collapse, we propose an entropy-aware router distillation mechanism that aligns the student router with the teacher while curbing over-concentration. Experiments show that B-Distill outperforms baselines by up to 6.6% in Rouge-L.