Author name cluster

Zhiwei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

16 papers

2 author rows

YNICL Journal 2026 Journal Article

Association of MRI indexes of glymphatic system with brain atrophy and cognitive impairment in cerebral small vessel disease

Lulu Ai
Zhiwei Li
Chaojuan Huang
Xia Zhou
Xiaoqun Zhu
Qiaoqiao Xu
Zhongwu Sun

BACKGROUND AND OBJECTIVE: The glymphatic system constitutes a brain-wide perivascular network responsible for brain metabolic waste removal, which may underlie pathogenesis in cerebral small vessel disease (CSVD). This study aimed to explore the associations of glymphatic function, assessed using multi-modal MRI indices, and both brain atrophy and cognitive impairment in CSVD. METHODS: The study included 160 participants comprising 120 patients with CSVD, including 52 without cognitive impairment (CSVD-NCI) and 68 with mild cognitive impairment (CSVD-MCI), alongside 40 healthy controls (HCs). All participants underwent neuropsychological and multi-modal neuroimaging assessments. Glymphatic function was assessed using four complementary MRI indices: choroid plexus (CP) volume, perivascular space (PVS) volume fraction, free water in white matter (FW-WM) fraction, and diffusion tensor image analysis along the perivascular space (DTI-ALPS) index. Gray matter volume (GMV) was evaluated via voxel-based morphology (VBM) analysis. Partial correlation and mediation analyses explored the relationships among glymphatic function, brain structure and cognitive performance. RESULTS: Compared to HCs, CSVD-MCI patients showed increased CP volume, FW-WM fraction, BG/putamen-PVS volume, and reduced DTI-ALPS index, accompanied by multifocal gray matter atrophy involving temporal and frontal regions. Advanced age was associated with increased CP and BG-PVS volume, but decreased DTI-ALPS index. A main effect of sex was observed, where males exhibited larger BG-PVS and FW-WM fraction, with lower DTI-ALPS index compared to females. Impaired glymphatic function was linked to both GMV loss and cognitive deficits, with right superior temporal and left postcentral GMV mediating glymphatic-cognitive associations, particularly in executive function and processing speed. CONCLUSION: Glymphatic dysfunction in CSVD, particularly in cognitive impairment stage, is closely related to brain atrophy and cognitive decline, supporting the potential utility of glymphatic metrics as clinically imaging biomarkers for assessing cognitive impairment risk and monitor disease progression in CSVD.

Details DOI

AAAI Conference 2026 Conference Paper

Federated Vision-Language-Recommendation with Personalized Fusion

Zhiwei Li
Guodong Long
Jing Jiang
Chengqi Zhang
Qiang Yang

Applying large pre-trained Vision-Language Models to recommendation is a burgeoning field, a direction we term Vision-Language-Recommendation (VLR). Bringing VLR to user-oriented on-device intelligence within a federated learning framework is a crucial step for enhancing user privacy and delivering personalized experiences. This paper introduces FedVLR, a federated VLR framework specially designed for user-specific personalized fusion of vision-language representations. At its core is a novel bi-level fusion mechanism: The server-side multi-view fusion module first generates a diverse set of pre-fused multimodal views. Subsequently, each client employs a user-specific mixture-of-expert mechanism to adaptively integrate these views based on individual user interaction history. This designed lightweight personalized fusion module provides an efficient solution to implement a federated VLR system. The effectiveness of our proposed FedVLR has been validated on seven benchmark datasets.

PDF Details DOI

EAAI Journal 2026 Journal Article

Lightweight method of foreign matter detection in coal conveying based on improved you only look once version 8 and embedded equipment

Guanfeng Du
Hongzheng Zhang
Yupeng Luo
Zhibo Bao
Zhiwei Li
Mingxin Zhou
Zhelin Liu
Shengxian Cao

During the process of conveying pulverized coal, the mixed foreign matter will not only affect the combustion efficiency of pulverized coal, but also cause safety accidents in coal conveying equipment. Therefore, it is very important to monitor the foreign matter in the process of conveying coal. Due to the limited scope of the actual conveying site, embedded equipment is needed for inspection. Aiming at the computing and memory challenges of embedded equipment, an improved lightweight YOLOv8 (you only look once version 8) algorithm is proposed. In the backbone of the algorithm, cross stage partial with 2 convolutions and lightweight PoolFormer (C2f_LPF) module is used to extract lightweight features, and foreign matter information is extracted by using multi-scale concerns in cross stage partial with deformable convolution (CSPDC) module. Then the part of the feature aggregation (PFA) module of the neck is used for lightweight feature fusion. The proposed C2f_LPF+CSPDC+PFA combination realizes a more balanced optimization of lightweight performance and detection accuracy and provides a solution to the contradiction between the limitation of computing resources of embedded equipment and the demand for real-time detection of accuracy in coal conveying. A self-made datasets contain scrap iron, stones, wooden stick and branch are trained and compared with faster region-based convolutional neural network (Faster R-CNN) and YOLO (you only look once) series algorithm on computer and embedded equipment. The datasets consist of 612 images with 4413 examples of foreign matter, which are collected on a lab-scale self-made coal conveying platform. The mean average precision (mAP), Giga floating-point operations per second (GFLOPS), parameters and frame per second (FPS) on embedded equipment are 0. 963, 6. 1, 2. 44 million and 37. 04, respectively. Compared with the original YOLOv8, the computation and parameters are reduced by 24. 7 % and 18. 9 % respectively, and the FPS is improved by 29. 6 %. By contrast, it also has better results than other algorithms, which is well compatible with the configuration requirements of embedded equipment and achieves a good balance between precision and speed. This shows a promising performance on a lab-scale platform and may be extended to real industrial lines after further validation.

Details DOI

AAAI Conference 2026 Conference Paper

The Structure-Equivalent Prior: Unifying Temporal Dynamics and 3D Evolution in 4D Latent Space

Jingyuan Gao
Tianyu Shen
Ruosen Hao
Te Guo
Zhiwei Li
Kunfeng Wang

Recent advances in deep learning-based 3D representation have achieved remarkable success, particularly in modeling static high-fidelity geometries. However, the extension of these techniques to dynamic 3D scenes introduces a critical challenge of effectively representing spatio-temporal dependencies, i.e., jointly modeling detailed spatial structures within frames and temporal dynamics across frames. To address this challenge, this paper proposes that the temporal evolution observed in dynamic 3D scenes is fundamentally attributable to the deformation of underlying spatial structures. To capture this relationship, we introduce a unified continuous 4D latent space representation incorporating a structure-equivalence prior, named SEP-4D. The core of SEP-4D is an efficient 4D tensor decomposition-fusion approach. This method fuses decomposed learnable 2D feature planes via a plane-wise spatio-temporal fusion mechanism of planar distributions, explicitly enforcing the principle that temporal evolution originates from geometric deformations of the 3D structure. To mitigate the associated computational demands, we sample the 3D probability volumes generated by VAE-based fusion into a spatio-temporally consistent 4D latent representation. The efficacy of our approach is validated through experiments on the fundamental task of 4D occupancy reconstruction. Extensive results demonstrate that, by leveraging the inherent equivalence of temporal dynamics and structural deformation, our method achieves high-quality reconstruction across various sequence lengths. Notably, for 4-frame scenes, we attain an impressive 91.68% mIoU, significantly outperforming state-of-the-art baselines on standard benchmarks.

PDF Details DOI

AAAI Conference 2026 Conference Paper

TransFR: Transferable Federated Recommendation with Adapter Tuning on Pre-trained Language Models

Honglei Zhang
Zhiwei Li
Haoxuan Li
Xin Zhou
Jie Zhang
Yidong Li

Federated recommendations (FRs), facilitating multiple local clients to collectively learn a global model without disclosing user private data, have emerged as a prevalent on-device service. In conventional FRs, a dominant paradigm is to utilize discrete identities to represent clients and items, which are then mapped to domain-specific embeddings to participate in model training. Despite considerable performance, we reveal three inherent limitations that can not be ignored in federated settings, i.e., non-transferability across domains, ineffectiveness in cold-start settings, and potential privacy violations during federated training. To this end, we propose a transferable federated recommendation model, TransFR, which delicately incorporates the general capabilities empowered by pre-trained models and the personalized abilities by fine-tuning local private data. Specifically, it first learns domain-agnostic representations of items by exploiting pre-trained models with public textual corpora. To tailor for FR tasks, we further introduce efficient federated adapter-tuning and post-adaptation personalization, which facilitate personalized adapters for each client by fitting local private data. We theoretically prove the advantages of incorporating adapter tuning in FRs regarding both effectiveness and privacy. Through extensive experiments, we show that our TransFR surpasses state-of-the-art FRs on transferability.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Compress Large Language Models via Collaboration Between Learning and Matrix Approximation

Yuesen Liao
Zhiwei Li
Binrui Wu
Zihao Cheng
Su Zhao
Shuai Chen
Weizhong Zhang

Sparse and low-rank matrix composite approximation has emerged as a promising paradigm for compressing large language models (LLMs), offering a more flexible pruning structure than conventional methods based solely on sparse matrices. The significant variation in weight redundancy across layers, along with the differing rank and sparsity structures of weight matrices, makes identifying the globally optimal pruning structure extremely challenging. Existing methods often depend on uniform or manually designed heuristic rules to allocate weight sparsity across layers, subsequently compressing each matrix using matrix approximation techniques. Given the above theoretical difficulty in global compression of LLMs and the limited computational and data resources available compared to the training phase, we argue that a collaboration between learning and matrix approximation is essential for effective compression. In this paper, we propose a novel LLM compression framework based on generalized bilevel optimization that naturally formulates an effective collaborative mechanism. Specifically, the outer loop frames the weight allocation task as a probabilistic optimization problem, enabling the automatic learning of both layer-wise sparsities and matrix-wise retained ranks, while the inner loop solves the corresponding sparsity and rank-constrained model compression problem via matrix approximation. Our main technical contributions include two key innovations for efficiently solving this bilevel optimization problem. First, we introduce a truncated Gaussian prior-based probabilistic parameterization integrated with a policy gradient estimator, which avoids expensive backpropagation and stabilizes the optimization process. Second, we design an adapted QR-based matrix approximation algorithm that significantly accelerates inner loop computations. Extensive experiments on Phi-3 and the LLama-2/3 family demonstrate the effectiveness of our method. Notably, it maintains over 95\% zero-shot accuracy under 50\% sparsity and achieves up to 2× inference speedup.

PDF Details

NeurIPS Conference 2025 Conference Paper

Computation and Memory-Efficient Model Compression with Gradient Reweighting

Zhiwei Li
Yuesen Liao
Binrui Wu
Yuquan Zhou
Xupeng Shi
Dongsheng Jiang
Yin Li
Weizhong Zhang

Pruning is a commonly employed technique for deep neural networks (DNNs) aiming at compressing the model size to reduce computational and memory costs during inference. In contrast to conventional neural networks, large language models (LLMs) pose a unique challenge regarding pruning efficiency due to their substantial computational and memory demands. Existing methods, particularly optimization-based ones, often require considerable computational resources in gradient estimation because they cannot effectively leverage weight sparsity of the intermediate pruned network to lower compuation and memory costs in each iteration. The fundamental challenge lies in the need to frequently instantiate intermediate pruned sub-models to achieve these savings, a task that becomes infeasible even for moderately sized neural networks. To this end, this paper proposes a novel pruning method for DNNs that is both computationally and memory-efficient. Our key idea is to develop an effective reweighting mechanism that enables us to estimate the gradient of the pruned network in current iteration via reweigting the gradient estimated on an outdated intermediate sub-model instantiated at an earlier stage, thereby significantly reducing model instantiation frequency. We further develop a series of techniques, e. g. , clipping and preconditioning matrix, to reduce the variance of gradient estimation and stabilize the optimization process. We conducted extensive experimental validation across various domains. Our approach achieves 50\% sparsity and a 1. 58$\times$ speedup in forward pass on Llama2-7B model with only 6 GB of memory usage, outperforming state-of-the-art methods with respect to both perplexity and zero-shot performance. As a by-product, our method is highly suited for distributed sparse training and can achieve a 2 $\times$ speedup over the dense distributed baselines.

PDF Details

NeurIPS Conference 2025 Conference Paper

Efficient Representativeness-Aware Coreset Selection

Zihao Cheng
Binrui Wu
Zhiwei Li
Yuesen Liao
Su Zhao
Shuai Chen
Yuan Gao
Weizhong Zhang

Dynamic coreset selection is a promising approach for improving the training efficiency of deep neural networks by periodically selecting a small subset of the most representative or informative samples, thereby avoiding the need to train on the entire dataset. However, it remains inherently challenging due not only to the complex interdependencies among samples and the evolving nature of model training, but also to a critical coreset representativeness degradation issue identified and explored in-depth in this paper, that is, the representativeness or information content of the coreset degrades over time as training progresses. Therefore, we argue that, in addition to designing accurate selection rules, it is equally important to endow the algorithms with the ability to assess the quality of the current coreset. Such awareness enables timely re-selection, mitigating the risk of overfitting to stale subsets—a limitation often overlooked by existing methods. To this end, this paper proposes an E fficient R epresentativeness- A ware C oreset S election method for deep neural networks, a lightweight framework that enables dynamic tracking and maintenance of coreset quality during training. While the ideal criterion—gradient discrepancy between the coreset and the full dataset—is computationally prohibitive, we introduce a scalable surrogate based on the signal-to-noise ratio (SNR) of gradients within the coreset, which is the main technical contribution of this paper and is also supported by our theoretical analysis. Intuitively, a decline in SNR indicates overfitting to the subset and declining representativeness. Leveraging this observation, our method triggers coreset updates without requiring costly Hessian or full-batch gradient computations, maintaining minimal computational overhead. Experiments on multiple datasets confirm the effectiveness of our approach. Notably, compared with existing gradient-based dynamic coreset selection baselines, our method achieves up to a 5. 4\% improvement in test accuracy across multiple datasets.

PDF Details

AAAI Conference 2025 Conference Paper

Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

Zhiwei Li
Guodong Long
Tianyi Zhou
Jing Jiang
Chengqi Zhang

Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy in a federated setting. Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector. However, the user embedding is usually insufficient to preserve the rich information of the fine-grained personalization across heterogeneous clients. This paper proposes a novel personalized FedCF method by preserving users' personalized information into a latent variable and a neural model simultaneously. Specifically, we decompose the modeling of user knowledge into two encoders, each designed to capture shared knowledge and personalized knowledge separately. A personalized gating network is then applied to balance personalization and generalization between the global and local encoders. Moreover, to effectively train the proposed framework, we model the CF problem as a specialized Variational AutoEncoder (VAE) task by integrating user interaction vector reconstruction with missing value prediction. The decoder is trained to reconstruct the implicit feedback from items the user has interacted with, while also predicting items the user might be interested in but has not yet interacted with. Experimental results on benchmark datasets demonstrate that the proposed method outperforms other baseline methods, showcasing superior performance.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

Yuhao Zhou
Yiheng Wang
Xuming He
Ruoyao Xiao
Zhiwei Li
Qiantai Feng
Zijie Guo
Yuejin Yang

Scientific discoveries increasingly rely on complex multimodal reasoning based on information-intensive scientific data and domain-specific expertise. Empowered by expert-level scientific benchmarks, scientific Multimodal Large Language Models (MLLMs) hold the potential to significantly enhance this discovery process in realistic workflows. However, current scientific benchmarks mostly focus on evaluating the knowledge understanding capabilities of MLLMs, leading to an inadequate assessment of their perception and reasoning abilities. To address this gap, we present the Scientists’ First Exam (SFE) benchmark, designed to evaluate the scientific cognitive capacities of MLLMs through three interconnected levels: scientific signal perception, scientific attribute understanding, scientific comparative reasoning. Specifically, SFE comprises 830 expert-verified VQA pairs across three question types, spanning 66 multimodal tasks across five high-value disciplines. Extensive experiments reveal that current state-of-the-art GPT-o3 and InternVL-3 achieve only 34. 08% and 26. 52% on SFE, highlighting significant room for MLLMs to improve in scientific realms. We hope the insights obtained in SFE will facilitate further developments in AI-enhanced scientific discoveries.

PDF Details

IROS Conference 2025 Conference Paper

TEM 3 -Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving

Wenzhuo Liu
Yicheng Qiao
Zhen Wang
Qiannan Guo
Zilong Chen
Meihua Zhou
Xinran Li
Letian Wang

Multi-task learning (MTL) can advance assistive driving by exploring inter-task correlations through shared representations. However, existing methods face two critical limitations: single-modality constraints limiting comprehensive scene understanding and inefficient architectures impeding real-time deployment. This paper proposes TEM 3 -Learning (Time-Efficient Multimodal Multi-task Learning), a novel framework that jointly optimizes driver emotion recognition, driver behavior recognition, traffic context recognition, and vehicle behavior recognition through a two-stage architecture. The first component, the mamba-based multi-view temporal-spatial feature extraction subnetwork (MTS-Mamba), introduces a forward-backward temporal scanning mechanism and global-local spatial attention to efficiently extract low-cost temporal-spatial features from multi-view sequential images. The second component, the MTL-based gated multimodal feature integrator (MGMI), employs task-specific multi-gating modules to adaptively highlight the most relevant modality features for each task, effectively alleviating the negative transfer problem in MTL. Evaluation on the AIDE dataset, our proposed model achieves state-of-the-art accuracy across all four tasks, maintaining a lightweight architecture with fewer than 6 million parameters and delivering an impressive 142. 32 FPS inference speed. Rigorous ablation studies further validate the effectiveness of the proposed framework and the independent contributions of each module. The code is available on https://github.com/Wenzhuo-Liu/TEM3-Learning.

Details

NeurIPS Conference 2024 Conference Paper

Low Precision Local Training is Enough for Federated Learning

Zhiwei Li
Yiqiu Li
Binbin Lin
Zhongming Jin
Weizhong Zhang

Federated Learning (FL) is a prevalent machine learning paradigm designed to address challenges posed by heterogeneous client data while preserving data privacy. Unlike distributed training, it typically orchestrates resource-constrained edge devices to communicate via a low-bandwidth communication network with a central server. This urges the development of more computation and communication efficient training algorithms. In this paper, we propose an efficient FL paradigm, where the local models in the clients are trained with low-precision operations and communicated with the server in low precision format, while only the model aggregation in the server is performed with high-precision computation. We surprisingly find that high precision models can be recovered from the low precision local models with proper aggregation in the server. In this way, both the workload in the client-side and the communication cost can be significantly reduced. We theoretically show that our proposed paradigm can converge to the optimal solution as the training goes on, which demonstrates that low precision local training is enough for FL. Our paradigm can be integrated with existing FL algorithms flexibly. Experiments across extensive benchmarks are conducted to showcase the effectiveness of our proposed method. Notably, the models trained by our method with the precision as low as 8 bits are comparable to those from the full precision training. As a by-product, we show that low precision local training can relieve the over-fitting issue in local training, which under heterogeneous client data can cause the client models drift further away from each other and lead to the failure in model aggregation. Code is released at https: //github. com/digbangbang/LPT-FL.

PDF Details DOI

IJCAI Conference 2023 Conference Paper

Generalized Discriminative Deep Non-Negative Matrix Factorization Based on Latent Feature and Basis Learning

Zijian Yang
Zhiwei Li
Lu Sun

As a powerful tool for data representation, deep NMF has attracted much attention in recent years. Current deep NMF builds the multi-layer structure by decomposing either basis matrix or feature matrix into multiple factors, and probably complicates the learning process when data is insufficient or exhibits simple structure. To overcome the limitations, a novel method called Generalized Deep Non-negative Matrix Factorization (GDNMF) is proposed, which generalizes several NMF and deep NMF methods in a unified framework. GDNMF simultaneously performs decomposition on both features and bases, which learns a hierarchical data representation based on multi-level basis. To further improve the latent representation and enhance its flexibility, GDNMF mutually reinforces shallow linear model and deep non-linear model. Moreover, semi-supervised GDNMF is proposed by treating partial label information as soft constraints in the multi-layer structure. An efficient two-phase optimization algorithm is developed, and experiments on five real-world datesets verify its superior performance compared with state-of-the-art methods.

PDF Details DOI

ICRA Conference 2022 Conference Paper

IPS300+: a Challenging multi-modal data sets for Intersection Perception System

Huanan Wang
Xinyu Zhang 0001
Zhiwei Li
Jun Li 0082
Kun Wang
Zhu Lei
Haibing Ren

Due to high complexity and occlusion, insufficient perception in the crowded urban intersection can be a serious safety risk for both human drivers and autonomous algorithms, whereas CVIS (Cooperative Vehicle Infrastructure System) is a proposed solution for full-participants perception under this scenario. However, the research on roadside multi-modal perception is still in its infancy, and there is no open-source data sets for such scene. Accordingly, this paper fills the gap. Through an IPS (Intersection Perception System) installed at the diagonal of the intersection, this paper proposes a high-quality multi-modal data sets for the intersection perception task. The center of the experimental intersection covers an area of 3000m 2, and the extended distance reaches 300m, which is typical for CVIS. The first batch of open-source data includes 14198 frames, and each frame has an average of 319. 84 labels, which is 9. 6 times larger than the most crowded data sets (H3D data sets in 2019) by now. Our data sets is available at: http://www.openmpd.com/column/IPS300.

Details

AAAI Conference 2020 Conference Paper

Adaptive Unimodal Cost Volume Filtering for Deep Stereo Matching

Youmin Zhang
Yimin Chen
Xiao Bai
Suihanjin Yu
Kun Yu
Zhiwei Li
Kuiyuan Yang

State-of-the-art deep learning based stereo matching approaches treat disparity estimation as a regression problem, where loss function is directly deﬁned on true disparities and their estimated ones. However, disparity is just a byproduct of a matching process modeled by cost volume, while indirectly learning cost volume driven by disparity regression is prone to overﬁtting since the cost volume is under constrained. In this paper, we propose to directly add constraints to the cost volume by ﬁltering cost volume with unimodal distribution peaked at true disparities. In addition, variances of the unimodal distributions for each pixel are estimated to explicitly model matching uncertainty under different contexts. The proposed architecture achieves state-ofthe-art performance on Scene Flow and two KITTI stereo benchmarks. In particular, our method ranked the 1st place of KITTI 2012 evaluation and the 4th place of KITTI 2015 evaluation (recorded on 2019. 8. 20). The codes of AcfNet are available at: https: //github. com/youmi-zym/AcfNet.

PDF Details

IJCAI Conference 2016 Conference Paper

Semi-Supervised Multimodal Deep Learning for RGB-D Object Recognition

Yanhua Cheng
Xin Zhao
Rui Cai
Zhiwei Li
Kaiqi Huang
Yong Rui

This paper studies the problem of RGB-D object recognition. Inspired by the great success of deep convolutional neural networks (DCNN) in AI, researchers have tried to apply it to improve the performance of RGB-D object recognition. However, DCNN always requires a large-scale annotated dataset to supervise its training. Manually labeling such a large RGB-D dataset is expensive and time consuming, which prevents DCNN from quickly promoting this research area. To address this problem, we propose a semi-supervised multimodal deep learning framework to train DCNN effectively based on very limited labeled data and massive unlabeled data. The core of our framework is a novel diversity preserving co-training algorithm, which can successfully guide DCNN to learn from the unlabeled RGB-D data by making full use of the complementary cues of the RGB and depth data in object representation. Experiments on the benchmark RGB-D dataset demonstrate that, with only 5% labeled training data, our approach achieves competitive performance for object recognition compared with those state-of-the-art results reported by fully-supervised methods.

PDF Details