Author name cluster

Yi Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

70 papers

2 author rows

EAAI Journal 2026 Journal Article

DawnNet: Domain-augmented multi-weighting network for endometrial histopathological image classification

Fengjun Zhao
Lin Wu
Yi Li
Xuelei He
Hongyan Du
Yanrong Chen
Xiaowei He
Yuqing Hou

Histopathological examination is the gold standard for diagnosing endometrial tissues, including normal endometrium, endometrial polyps, endometrial hyperplasia, and endometrial adenocarcinoma. However, subtle variations in gland-to-stroma ratios and nuclear morphology make the diagnosis subjective and dependent on pathologist expertise. Computer-aided diagnosis systems using deep learning-based approaches can improve diagnostic efficiency by automatically extracting representative features. However, their performance often degrades when encountering data variations from different institutes—a domain shift issue that violates the independent and identically distributed assumption between training and testing data. This out-of-distribution challenge is not fully addressed by existing domain generalization methods, which often overlook key morphological features essential for histopathological interpretation. To address this issue, we propose DawnNet, a domain-augmented multi-weighting network for robust endometrial histopathological image classification. DawnNet incorporates a domain augmentation module to improve generalization, a spatial–channel weighting attention module to enhance discriminative features while suppressing domain-specific ones, a sample weighting module to reduce spurious correlations, and a hybrid objective function to learn domain-invariant and diagnosis-relevant features. Experiments on publicly available datasets demonstrate that DawnNet outperforms state-of-the-art methods, showing promising generalization for both in-distribution and out-of-distribution cases. Codes are available at https: //github. com/aliy-ali/DawnNet.

Details DOI

AAAI Conference 2026 Conference Paper

Doubly Debiased Test-Time Prompt Tuning for Vision-Language Models

Fei Song
Yi Li
Rui Wang
Jiahuan Zhou
Changwen Zheng
Jiangmeng Li

Test-time prompt tuning for vision-language models has demonstrated impressive generalization capabilities under zero-shot settings. However, tuning the learnable prompts solely based on unlabeled test data may induce prompt optimization bias, ultimately leading to suboptimal performance on downstream tasks. In this work, we analyze the underlying causes of prompt optimization bias from both the model and data perspectives. In terms of the model, the entropy minimization objective typically focuses on reducing the entropy of model predictions while overlooking their correctness. This can result in overconfident yet incorrect outputs, thereby compromising the quality of prompt optimization. On the data side, prompts affected by optimization bias can introduce misalignment between visual and textual modalities, which further aggravates the prompt optimization bias. To this end, we propose a Doubly Debiased Test-Time Prompt Tuning method, abbreviated as D2TPT. Specifically, we first introduce a dynamic retrieval-augmented modulation module that retrieves high-confidence knowledge from a dynamic knowledge base using the test image feature as a query, and uses the retrieved knowledge to modulate the predictions. Guided by the refined predictions, we further develop a reliability-aware prompt optimization module that incorporates a confidence-based weighted ensemble and cross-modal consistency distillation to impose regularization constraints during prompt tuning. Extensive experiments across 15 benchmark datasets involving both natural distribution shifts and cross-datasets generalization demonstrate that D2TPT outperforms baselines, validating its effectiveness in mitigating prompt optimization bias.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Good Gradients Poison Your Model: Evading Defenses in Federated Learning via Boundary-adaptive Perturbation

Xiaojie Zhao
Jinqiao Shi
Yi Li
Junmin Huang
Chongru Fan

Federated learning (FL) allows for collaborative model training while preserving data privacy, but its distributed nature makes it vulnerable to poisoning attacks. Existing defense methods typically rely on using gradients from multiple clients to define a trusted region, selecting only the trustworthy update (good gradients) within this region for aggregation. Mainstream defense boundaries are categorized as hard boundaries, soft boundaries, and semi-soft boundaries. However, we argue that even good gradients within these boundaries can still be exploited by attackers to poison the model. To tackle this challenge, we introduce a boundary-adaptive attack method that leverages the directional properties of optimization techniques to derive baseline poisoned gradients. Through iterative perturbation, it generates seemingly innocent gradients that subtly deviate from the global model. Our extensive study on benchmark datasets and mainstream defensive mechanisms confirms that the proposed attack raises a significantly threat to the integrity and security of FL practices, regardless of the flourishing of robust FL methods.

PDF Details DOI

TAAS Journal 2026 Journal Article

Graph Unlearning System with Subgraph De-Isolation Measures

Yi Li
Debo Cheng
Guixian Zhang
Chengyu Li
Shichao Zhang

Graph unlearning system offers a promising solution for securely erasing specific data points and their associated influences from Graph Neural Networks (GNNs). However, existing approaches often treat the problem as multiple isolated and disjoint sub-problems by partitioning graph data into isolated subgraphs, which overlooks the native graph structure information between subgraphs. This results in biased representations that hinder the accurate modeling of key connections and relationships within the data, leading to a notable reduction in model utility due to this loss of information. To address these issues, we propose an innovative framework called N on- I solated G raph Eraser (NIGEraser) that decomposes the unlearning task into multiple non-isolated, intersecting sub-problems. Specifically, a novel non-isolated graph partitioning strategy is proposed for NIGEraser that mitigates isolation by replicating key nodes across multiple neighboring subgraphs, along with an attention-based sub-model aggregation technique in that global graph structure information is employed. By this design, a broader natural neighborhood is explored, capturing and effectively utilizing the critical graph structure features lost between subgraphs during partitioning, thereby reducing information loss during task decomposition and aggregation. Additionally, it is demonstrated that graph unlearning methods can overcome the limitations of traditional isolated partitioning strategies, providing an effective theoretical constraint on time consumption. Extensive experiments on four real-world graph-structured datasets show that NIGEraser consistently outperforms existing unlearning methods, offering superior model utility while ensuring efficient and deterministic data removal.

Details DOI

AAAI Conference 2026 Conference Paper

PLaST: Towards Paralinguistic-aware Speech Translation

Yi Li
Rui Zhao
Ruiquan Zhang
Jinsong Su
Daimeng Wei
Min Zhang
Yidong Chen

Speech translation (ST) aims to translate speech from a source language into text in the target language. Naturally, speech signals contain paralinguistic cues beyond linguistic content, which could influence or even alter the interpretation of a lexically identical sentence, thereby yielding distinct translations. However, existing ST models lack direct and sufficient modeling of paralinguistic information, which limits their ability to perceive paralinguistic cues and understand speech comprehensively, leading to degraded translation performance. In response, we propose Paralinguistic-aware Speech Translation (PLaST), a novel dual-branch framework which directly leverages paralinguistic cues beyond the linguistic content. Specifically, PLaST employs a speech encoder and a style extractor to independently generate linguistic and paralinguistic representations, respectively. To obtain a purified linguistic representation aligned with the text representation, a hierarchical Optimal Transport (OT) is applied on the layer-wise outputs from an LLM decoder. Then, the paralinguistic information is retrieved and refined with an Attention-based Retrieval (AR) module, with the linguistic representation serving as queries to enable joint guidance for semantic understanding and translation generation. PLaST outperforms the strong baseline with an average of 5.0 directional and 4.5 global contrastive likelihood scores on the paralinguistic-sensitive benchmark ContraProST, demonstrating its superior capability in paralinguistic perception. Further experiments on the standard speech translation benchmark CoVoST-2 show that PLaST generalizes well to typical ST scenarios.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Trimming the Fat: Redundancy-Aware Acceleration Framework for DGNNs

Renhong Huang
Yuxuan Cao
Yi Li
Junwei Hu
Zihua Xiong
Shuai Fang
Sheng Guo
Bo Zheng

Temporal graphs are essential for modeling complex real-world systems, such as social interactions, financial transactions, and recommendation systems, but the high computational cost and model complexity of dynamic graph neural networks (DGNNs) pose significant challenges for practical deployment. Although various pruning and sampling techniques have proven effective in accelerating static GNNs, they fall short in dynamic settings due to temporal dependencies in evolving graph structures. To address these challenges, we propose TrimDG, a general framework that accelerates DGNNs by eliminating both static and runtime redundancies. For static redundancy, we introduce a novel node influence metric, Temporal Personalized PageRank (TPP), to prune less informative nodes, and employ temporal binning to remove redundant events. For runtime redundancy during training, we develop an adaptive sampling strategy guided by graph information bottleneck and further reduce sampling frequency through temporal batch selector and sampling cache. Theoretical analysis supports our design, and experiments on real-world datasets show that TrimDG reduces runtime by an average of 83.49% across diverse DGNN backbones, while maintaining strong predictive performance, demonstrating both its efficiency and generalizability.

PDF Details DOI

EAAI Journal 2025 Journal Article

A flow rate estimation method for gas–liquid two-phase flow based on filter-enhanced convolutional neural network

Yuxiao Jiang
Yinyan Liu
Lihui Peng
Yi Li

Accurate estimation of flow rate in gas–liquid two-phase flow is crucial for various industrial processes. How to accurately estimate flow rate remains a challenging problem. Previously, deep learning-based methods focused on a few human-set points with single task learning. In addition, the data were not denoised. In this study, a flow rate estimation method based on a filter-enhanced convolutional neural network (FECNN) is proposed for gas–liquid two-phase flow. The method leverages multimodal data from a Venturi tube and an electrical capacitance tomography (ECT) sensor as input, utilizing multilayer perceptron (MLP) to fuse data. Subsequently, a learnable filter module is employed to attenuate noise adaptively, followed by multiscale convolutional neural network (MSCNN) extraction of flow rate features at different scales. Finally, the method enables estimate each single-phase flow rate simultaneously through multi-task learning (MTL). The adaptive noise attenuation capabilities of the learnable filter module are demonstrated, and the ability of the proposed MSCNN to capture multiscale flow rate features through multiple comparative experiments is shown. Additionally, a qualitative comparison with recent flow rate estimation methods is provided. Overall, this study demonstrates the effectiveness and superiority of the proposed FECNN in flow rate estimation.

Details DOI

TAAS Journal 2025 Journal Article

Adaptive Scheduling of High-Availability Drone Swarms for Congestion Alleviation in Connected Automated Vehicles

Shengye Pang
Yi Li
Zhen Qin
Xinkui Zhao
Jintao Chen
Fan Wang
Jianwei Yin

The Intelligent Transportation System (ITS) serves as a pivotal element within urban networks, offering decision support to users and connected automated vehicles through comprehensive information gathering, sensing, device control, and data processing. Presently, ITS predominantly relies on sensors embedded in fixed infrastructure, notably Roadside Units (RSUs). However, RSUs are confined by coverage limitations and may encounter challenges in prompt emergency responses. On-demand resources, such as drones, present a viable option to supplement these deficiencies effectively. This article introduces an approach where Software-Defined Networking and Mobile Edge Computing technologies are integrated to formulate a high-availability drone swarm control and communication infrastructure framework comprising the cloud layer, edge layer, and device layer. Drones confront limitations in flight duration attributed to battery limitations, posing a challenge in sustaining continuous monitoring of road conditions over extended periods. Effective drone scheduling stands as a promising solution to overcome these constraints. To tackle this issue, we initially utilized Graph WaveNet, a specialized graph neural network structure tailored for spatial-temporal graph modeling, for training a congestion prediction model using real-world dataset inputs. Building upon this, we further propose an algorithm for drone scheduling based on congestion prediction. Our simulation experiments using real-world data demonstrate that, compared to the baseline method, the proposed scheduling algorithm not only yielded superior scheduling gains but also mitigated drone idle rates.

Details DOI

TAAS Journal 2025 Journal Article

Chameleon Hash based Collaborative Time-Series Data Integrity Monitoring

Yi Li
Jian Shen
Mohammad S. Obaidat
Pandi Vijayakumar
Sendhilkumar Selvaradjou
Kuei-Fang Hsiao

The importance of the ocean to humanity is undeniable, whether in terms of ecology, climate, resources. Utilizing collected ocean data combined with AI to achieve adaptive and automated processing and prediction is a current research focus. The effectiveness of AI applications largely depends on the integrity of ocean data. Ocean data has three characteristics: vast spatial coverage, long temporal duration, and large volume. Traditional cloud-based data integrity verification methods are no longer suitable. Ocean data should be processed on edge servers located closer to the data collection points and then sent to the appropriate data storage servers. The data processing methods should be lightweight to accommodate the sequential characteristics of data. Moreover, the data integrity monitoring process should be collaboratively completed on the data storage servers without the need for a central third party. To this end, we propose a ocean data integrity monitoring protocol. It generates data for different storage servers, using sensor sampling periods and data masks, and utilizes chameleon hash with ephemeral trapdoors to generate validators, thus supporting mutual integrity monitoring among storage servers. Experiments demonstrate that our scheme compared to the latest solutions, not only meets security requirements but also offers advantages of computational overhead.

Details DOI

AAAI Conference 2025 Conference Paper

Community-Centric Graph Unlearning

Yi Li
Shichao Zhang
Guixian Zhang
Debo Cheng

Graph unlearning technology has become increasingly important since the advent of the `right to be forgotten' and the growing concerns about the privacy and security of artificial intelligence. Graph unlearning aims to quickly eliminate the effects of specific data on graph neural networks (GNNs). However, most existing deterministic graph unlearning frameworks follow a balanced partition-submodel training-aggregation paradigm, resulting in a lack of structural information between subgraph neighborhoods and redundant unlearning parameter calculations. To address this issue, we propose a novel Graph Structure Mapping Unlearning paradigm (GSMU) and a novel method based on it named Community-centric Graph Eraser (CGE). CGE maps community subgraphs to nodes, thereby enabling the reconstruction of a node-level unlearning operation within a reduced mapped graph. CGE makes the exponential reduction of both the amount of training data and the number of unlearning parameters. Extensive experiments conducted on five real-world datasets and three widely used GNN backbones have verified the high performance and efficiency of our CGE method, highlighting its potential in the field of graph unlearning.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Complex-Cycle-Consistent Diffusion Model for Monaural Speech Enhancement

Yi Li
Yang Sun
Plamen P Angelov

In this paper, we present a novel diffusion model-based monaural speech enhancement method. Our approach incorporates the separate estimation of speech spectra's magnitude and phase in two diffusion networks. Throughout the diffusion process, noise clips from real-world noise interferences are added gradually to the clean speech spectra and a noise-aware reverse process is proposed to learn how to generate both clean speech spectra and noise spectra. Furthermore, to fully leverage the intrinsic relationship between magnitude and phase, we introduce a complex-cycle-consistent (CCC) mechanism that uses the estimated magnitude to map the phase, and vice versa. We implement this algorithm within a phase-aware speech enhancement diffusion model (SEDM). We conduct extensive experiments on public datasets to demonstrate the effectiveness of our method, highlighting the significant benefits of exploiting the intrinsic relationship between phase and magnitude information to enhance speech. The comparison to conventional diffusion models demonstrates the superiority of SEDM.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Efficient Hi-Fi Style Transfer via Statistical Attention and Modulation

Zhirui Fang
Yi Li
Xin Xie
Chengyan Li
Yanqing Guo

Style transfer is a challenging task in computer vision, aiming to blend the stylistic features of one image with the content of another while preserving the content details. Traditional methods often face challenges in terms of computational efficiency and fine-grained content preservation. In this paper, we propose a novel feature modulation mechanism based on parameterized normalization, where the modulation parameters for content and style features are learned using a dual convolution network (BiConv). These parameters adjust the mean and standard deviation of the features, improving both the stability and quality of the style transfer process. To achieve fast inference, we introduce an efficient acceleration technique by leveraging a row and column weighted attention matrix. In addition, we incorporate a contrastive learning scheme to align the local features of the content and the stylized images, improving the fidelity of the generated output. Experimental results demonstrate that our method significantly improves the inference speed and the quality of style transfer while preserving content details, outperforming existing approaches based on both convolution and diffusion.

PDF Details DOI

ICML Conference 2025 Conference Paper

Efficiently Serving Large Multimodal Models Using EPD Disaggregation

Gursimran Singh
Xinglu Wang
Yifan Hu
Timothy Tin Long Yu
Linzi Xing
Wei Jiang
Zhefeng Wang
Xiaolong Bai

Large Multimodal Models (LMMs) extend Large Language Models (LLMs) by handling diverse inputs such as images, audio, and video, but at the cost of adding a multimodal encoding stage that increases both computational and memory overhead. This step negatively affects key Service Level Objectives (SLOs), such as time to first token (TTFT) and time per output token (TPOT). We introduce Encode-Prefill-Decode (EPD) Disaggregation, a novel framework that separates the encoding, prefill, and decode stages onto dedicated resources. Unlike current systems, which bundle encoding and prefill together, our approach decouples these steps, unlocking new opportunities and optimizations. These include a mechanism to cache multimedia tokens for efficient transfer, a novel way to parallelize the encoding load within a request, a module for optimal resource allocation for disaggregated serving, and a novel role-switching method to handle changing workload characteristics. Experimental evaluations with popular LMMs show substantial gains in memory efficiency (up to 15$\times$ lower peak memory utilization), batch sizes (up to 22$\times$ larger), 10$\times$ more images per request, and 2. 2$\times$ larger KV caches. Furthermore, it leads to significant improvements in SLO attainment (up to 90–100% improvement) and TTFT (up to 71% reduction), compared to systems that do not disaggregate. The code is available at https: //github. com/vbdi/epdserve.