Author name cluster

Yaping Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

EAAI Journal 2026 Journal Article

Balance divergence for knowledge distillation

Yafei Qi
Chen Wang
Zhaoning Zhang
Yaping Liu
Yongmin Zhang

Knowledge distillation (KD) represents a fundamental artificial intelligence (AI) technique for model compression and optimization. In computer vision AI applications, most KD methods use Kullback–Leibler (KL) divergence to align teacher–student output probabilities, but often neglect crucial negative aspects of teacher “dark knowledge” by underweighting low-probability signals. This limitation leads to suboptimal logit mimicry and unbalanced knowledge transfer to the student network. In this paper, we investigate the impact of this imbalance and propose a novel method, named Balance Divergence Distillation (BDD). By introducing a compensatory operation using reverse KL divergence, our method can improve the modeling of the extremely small values in the negative from the teacher and preserve the learning capacity for the positive. Furthermore, we test the impact of different temperature coefficients adjustments, which can lead to further balance in knowledge transfer. The evaluation results demonstrate that our method achieves accuracy improvements of 1 % ∼ 3 % for lightweight student networks over standard KD methods on both Canadian Institute for Advanced Research 100 classes(CIFAR-100) and ImageNet datasets. Additionally, when applied to semantic segmentation, our approach enhances the student by 4. 55% in mean Intersection over Union (mIoU) compared to the baseline on the Cityscapes dataset. These experiments confirm that our method provides a simple yet highly effective solution that can be seamlessly integrated with various KD frameworks across different vision tasks.

Details DOI

NeurIPS Conference 2025 Conference Paper

EAReranker: Efficient Embedding Adequacy Assessment for Retrieval Augmented Generation

Dongyang Zeng
Yaping Liu
Wei Zhang
Shuo Zhang
Xinwang Liu
Binxing Fang

With the increasing adoption of Retrieval-Augmented Generation (RAG) systems for knowledge-intensive tasks, ensuring the adequacy of retrieved documents has become critically important for generation quality. Traditional reranking approaches face three significant challenges: substantial computational overhead that scales with document length, dependency on plain text that limits application in sensitive scenarios, and insufficient assessment of document value beyond simple relevance metrics. We propose EAReranker, an efficient embedding-based adequacy assessment framework that evaluates document utility for RAG systems without requiring access to original text content. The framework quantifies document adequacy through a comprehensive scoring methodology considering verifiability, coverage, completeness and structural aspects, providing interpretable adequacy classifications for downstream applications. EAReranker employs a Decoder-Only Transformer architecture that introduces embedding dimension expansion method and bin-aware weighted loss, designed specifically to predict adequacy directly from embedding vectors. Our comprehensive evaluation across four public benchmarks demonstrates that EAReranker achieves competitive performance with state-of-the-art plaintext rerankers while maintaining constant memory usage ($\sim$550MB) regardless of input length and processing 2-3x faster than traditional approaches. The semantic bin adequacy prediction accuracy of 92. 85\% LACC@10 and 86. 12\% LACC@25 demonstrates its capability to effectively filter out inadequate documents that could potentially mislead or adversely impact RAG system performance, thereby ensuring only high-utility information serves as generation context. These results establish EAReranker as an efficient and practical solution for enhancing RAG system performance through improved context selection while addressing the computational and privacy challenges of existing methods.

PDF Details

EAAI Journal 2025 Journal Article

Modal Mimicking Knowledge Distillation for monocular three-dimensional object detection

Menghao Yang
Yafei Qi
Bing Xiong
Zhaoning Zhang
Yaping Liu

Monocular three-dimensional (3D) object detection has gained attention for its cost-effectiveness in autonomous driving systems. Nevertheless, the extraction of depth information from two-dimensional (2D) images is an ill-posed problem. To address this challenge, cross-modal knowledge distillation techniques is widely adopted. A prevalent approach involves projecting Light Detection and Ranging (LiDAR) data onto the image plane to train teacher networks that share homogeneous architectures with student networks. Nevertheless, the alignment of features between LiDAR-based teacher networks and image-based student networks remains challenging. In order to address the inherent misalignment between modalities, this paper proposes a Modal Mimicking Knowledge Distillation (MMKD) framework using deep convolutional neural networks for autonomous perception tasks. The purpose of the MMKD framework is to explicitly reinforce depth features in the image-based student network, by introducing a depth prediction branch on the foundation of homogeneous teacher and student networks. Specifically, we propose a Road Plane Discretization (RPD) strategy that transforms projected LiDAR information to construct depth supervision signals better suited to the image plane. Concurrently, we propose Dual-Kullback–Leibler divergence distillation (DualKL), which integrates a dynamic Kullback–Leibler divergence balancing mechanism with depth uncertainty weighting, to efficaciously extract and transfer knowledge from the teacher network. The experimental results demonstrate that the proposed method achieves significant performance improvements on Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) benchmarks. Specifically, our approach achieves 4. 4% improvement on the easy level and 2. 1% improvement on the difficult level compared to the baseline model. Our code will be released at https: //github. com/yangmenghao9/MonoMMKD.

Details DOI

AAAI Conference 2025 Conference Paper

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models

Linqin Wang
Yaping Liu
Zhengtao Yu
Shengxiang Gao
Cunli Mao
Yuxin Huang
Wenjun Wang
Ling Dong

With the rapid advancement of large language models (LLMs), discrete speech representations have become crucial for integrating speech into LLMs. Existing methods for speech representation discretization rely on a predefined codebook size and Euclidean distance-based quantization. However, 1) the size of codebook is a critical parameter that affects both codec performance and downstream task training efficiency. 2) The Euclidean distance-based quantization may lead to audio distortion when the size of the codebook is controlled within a reasonable range. In fact, in the field of information compression, structural information and entropy guidance are crucial, but previous methods have largely overlooked these factors. Therefore, we address the above issues from an information-theoretic perspective, we present SECodec, a novel speech representation codec based on structural entropy (SE) for building speech language models. Specifically, we first model speech as a graph, clustering the speech features nodes within the graph and extracting the corresponding codebook by hierarchically and disentangledly minimizing 2D SE. Then, to address the issue of audio distortion, we propose a new quantization method. This method still adheres to the 2D SE minimization principle, adaptively selecting the most suitable token corresponding to the cluster for each incoming original speech node. Furthermore, we develop a Structural Entropy-based Speech Language Model (SESLM) that leverages SECodec. Experimental results demonstrate that SECodec performs comparably to EnCodec in speech reconstruction, and SESLM surpasses VALL-E in zero-shot text-to-speech tasks.

PDF Details DOI