Author name cluster

Di Lu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

2 author rows

JBHI Journal 2026 Journal Article

PathFusion-Net: A Rough Path Theory-Based Deep Learning Model for ECG Arrhythmia Classification

Tianlong Feng
Qingchen Li
Yuanyuan Zhang
Yongzhi Liao
Di Lu
Liping Wang
Jianqin Zhao
Lei Jiang

This study introduces a novel electrocardiogram (ECG) arrhythmia classification model, PathFusion-Net, which integrates Rough Path Theory with deep learning technologies. The model combines Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), Path Signatures, and Path Development to extract spatial morphological features from ECG images and multi-order temporal representations from ECG signals. By adopting an inter-patient split paradigm, our approach more closely reflects real-world clinical diagnostic settings compared to intra-patient methods. The model demonstrates state-of-the-art overall classification performance on both the MIT-BIH Arrhythmia Database and a private clinical dataset, achieving 94. 7% and 95. 1% accuracy, respectively, under the AAMI four-class standard with an inter-patient split paradigm. On the MIT-BIH dataset, the proposed method attains competitive precision and recall across multiple arrhythmia types, including 95. 2% /87. 9% for ventricular ectopic beats (V) and 75. 7% /92. 3% for supraventricular ectopic beats (S), indicating balanced performance across clinically diverse categories. This research highlights the potential of Rough Path Theory in time-series analysis and offers a novel deep learning framework for automated early detection and monitoring of ECG arrhythmias. The code used in this study is available at: https://github.com/Rand2AI/PathFusion-Net.

Details DOI

AAAI Conference 2026 Conference Paper

Sharp Eyes and Memory for VideoLLMs: Information-Aware Visual Token Pruning for Efficient and Reliable VideoLLM Reasoning

Jialong Qin
Xin Zou
Di Lu
Yibo Yan
Xuming Hu

Current Video Large Language Models (VideoLLMs) suffer from quadratic computational complexity and key-value cache scaling, due to their reliance on processing excessive redundant visual tokens. To address this problem, we propose SharpV, a minimalist and efficient method for adaptive pruning of visual tokens and KV cache. Different from most uniform compression approaches, SharpV dynamically adjusts pruning ratios based on spatial-temporal information. Remarkably, this adaptive mechanism occasionally achieves performance gains over dense models, offering a novel paradigm for adaptive pruning. During the KV cache pruning stage, based on observations of visual information degradation, SharpV prunes degraded visual features via a self-calibration manner, guided by similarity to original visual features. In this way, SharpV achieves hierarchical cache pruning from the perspective of information bottleneck, offering a new insight into VideoLLMs' information flow. Experiments on multiple public benchmarks demonstrate the superiority of SharpV. Moreover, to the best of our knowledge, SharpV is notably the first two-stage pruning framework that operates without requiring access to exposed attention scores, ensuring full compatibility with hardware acceleration techniques like Flash Attention.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Don't Just Chase “Highlighted Tokens” in MLLMs: Revisiting Visual Holistic Context Retention

Xin Zou
Di Lu
Yizhou Wang
Yibo Yan
Yuanhuiyi Lyu
Xu Zheng
Linfeng Zhang
Xuming Hu

Despite their powerful capabilities, multimodal large language models (MLLMs) suffer from considerable computational overhead due to their reliance on massive visual tokens. Recent studies have explored token pruning to alleviate this problem, which typically uses text-vision cross-attention or [CLS] attention to assess and discard redundant visual tokens. In this work, we identify a critical limitation of such attention-first pruning approaches, i. e. , they tend to preserve semantically similar tokens, resulting in pronounced performance drops under high pruning rates. To this end, we propose HoloV, a simple yet effective, plug-and-play visual token pruning framework for efficient inference. Distinct from previous attention-first schemes, HoloV rethinks token retention from a holistic perspective. By adaptively distributing the pruning budget across different spatial crops, HoloV ensures that the retained tokens capture the global visual context rather than isolated salient features. This strategy minimizes representational collapse and maintains task-relevant information even under aggressive pruning. Experimental results demonstrate that our HoloV achieves superior performance across various tasks, MLLM architectures, and pruning ratios compared to SOTA methods. For instance, LLaVA1. 5 equipped with HoloV preserves 95. 8% of the original performance after pruning 88. 9% of visual tokens, achieving superior efficiency-accuracy trade-offs.

PDF Details

ECAI Conference 2023 Conference Paper

FATRER: Full-Attention Topic Regularizer for Accurate and Robust Conversational Emotion Recognition

Yuzhao Mao
Di Lu
Yang Zhang
Xiaojie Wang 0006

This paper concentrates on the understanding of interlocutors’ emotions evoked in conversational utterances. Previous studies in this literature mainly focus on more accurate emotional predictions, while ignoring model robustness when the local context is corrupted by adversarial attacks. To maintain robustness while ensuring accuracy, we propose an emotion recognizer augmented by a full-attention topic regularizer, which enables an emotion-related global view when modeling the local context in a conversation. A joint topic modeling strategy is introduced to implement regularization from both representation and loss perspectives. To avoid over-regularization, we drop the constraints on prior distributions that exist in traditional topic modeling and perform probabilistic approximations based entirely on attention alignment. Experiments show that our models obtain more favorable results than state-of-the-art models, and gain convincing robustness under three types of adversarial attacks. Code: https: //github. com/ludybupt/FATRER.

Details