Author name cluster

Ziwei Li

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

AAAI Conference 2026 Conference Paper

Factorization-in-Loop:Proximal Fill-in Minimization for Sparse Matrix Reordering

Ziwei Li
Shuzi Niu
Tao Yuan
Huiyuan Li
Wenjia Wu

Fill-ins are new nonzero elements in the summation of the upper and lower triangular factors generated during LU factorization. For large sparse matrices, they will increase the memory usage and computational time, and be reduced through proper row or column arrangement, namely matrix reordering. Finding a row or column permutation with the minimal fill-ins is NP-hard, and surrogate objectives are designed to derive fill-in reduction permutations or learn a reordering function. However, there is no theoretical guarantee between the golden criterion and these surrogate objectives. Here we propose to learn a reordering network by minimizing l1 norm of triangular factors of the reordered matrix to approximate the exact number of fill-ins. The reordering network utilizes a graph encoder to predict row or column node scores. For inference, it is easy and fast to derive the permutation from sorting algorithms for matrices. For gradient based optimization, there is a large gap between the predicted node scores and resultant triangular factors in the optimization objective. To bridge the gap, we first design two reparameterization techniques to obtain the permutation matrix from node scores. The matrix is reordered by multiplying the permutation matrix. Then we introduce the factorization process into the objective function to arrive at target triangular factors. The overall objective function is optimized with the alternating direction method of multipliers and proximal gradient descent. Experimental results on benchmark sparse matrix collection SuiteSparse show the fill-in and LU factorization time reduction of our proposed method is 0.2% and 17.8% compared with state-of-the-art baselines.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Combinatorial Ski Rental Problem: Robust and Learning-Augmented Algorithms

Ziwei Li
Bo Sun
Zhiqiu Zhang
Mohammad Hajiesmaili
Binghan Wu
Lin Yang
Yang Gao

We introduce and study the Combinatorial Ski Rental (CSR) problem, which involves multiple items that can be rented or purchased, either individually or in combination. At each time step, a decision-maker must make an irrevocable buy-or-rent decision for items that have not yet been purchased, without knowing the end of the time horizon. We propose a randomized online algorithm, Sorted Optimal Amortized Cost (SOAC), that achieves the optimal competitive ratio. Moreover, SOAC can be extended to address various well-known ski rental variants, including the multi-slope, multi-shop, multi-commodity ski rental and CSR with upgrading problems. Building on the proposed SOAC algorithm, we further develop a learning-augmented algorithm that leverages machine-learned predictions to improve the performance of CSR. This algorithm is capable of recovering or improving upon existing results of learning-augmented algorithms in both the classic ski rental and multi-shop ski rental problems. Experimental results validate our theoretical analysis and demonstrate the advantages of our algorithms over baseline methods for ski rental problems.

PDF Details

NeurIPS Conference 2024 Conference Paper

FedNE: Surrogate-Assisted Federated Neighbor Embedding for Dimensionality Reduction

Ziwei Li
Xiaoqi Wang
Hong-You Chen
Han-Wei Shen
Wei-Lun Chao

Federated learning (FL) has rapidly evolved as a promising paradigm that enables collaborative model training across distributed participants without exchanging their local data. Despite its broad applications in fields such as computer vision, graph learning, and natural language processing, the development of a data projection model that can be effectively used to visualize data in the context of FL is crucial yet remains heavily under-explored. Neighbor embedding (NE) is an essential technique for visualizing complex high-dimensional data, but collaboratively learning a joint NE model is difficult. The key challenge lies in the objective function, as effective visualization algorithms like NE require computing loss functions among pairs of data. In this paper, we introduce \textsc{FedNE}, a novel approach that integrates the \textsc{FedAvg} framework with the contrastive NE technique, without any requirements of shareable data. To address the lack of inter-client repulsion which is crucial for the alignment in the global embedding space, we develop a surrogate loss function that each client learns and shares with each other. Additionally, we propose a data-mixing strategy to augment the local data, aiming to relax the problems of invisible neighbors and false neighbors constructed by the local $k$NN graphs. We conduct comprehensive experiments on both synthetic and real-world datasets. The results demonstrate that our \textsc{FedNE} can effectively preserve the neighborhood data structures and enhance the alignment in the global embedding space compared to several baseline methods.

PDF Details DOI

ICLR Conference 2023 Conference Paper

On the Importance and Applicability of Pre-Training for Federated Learning

Hong-You Chen
Cheng-Hao Tu 0001
Ziwei Li
Han-Wei Shen
Wei-Lun Chao

Pre-training is prevalent in nowadays deep learning to improve the learned model's performance. However, in the literature on federated learning (FL), neural networks are mostly initialized with random weights. These attract our interest in conducting a systematic study to explore pre-training for FL. Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL, but also close its accuracy gap to the counterpart centralized learning, especially in the challenging cases of non-IID clients' data. To make our findings applicable to situations where pre-trained models are not directly available, we explore pre-training with synthetic data or even with clients' data in a decentralized manner, and found that they can already improve FL notably. Interestingly, many of the techniques we explore are complementary to each other to further boost the performance, and we view this as a critical result toward scaling up deep FL for real-world applications. We conclude our paper with an attempt to understand the effect of pre-training on FL. We found that pre-training enables the learned global models under different clients' data conditions to converge to the same loss basin, and makes global aggregation in FL more stable. Nevertheless, pre-training seems to not alleviate local model drifting, a fundamental problem in FL under non-IID data.

Details

IJCAI Conference 2020 Conference Paper

Partial Multi-Label Learning via Multi-Subspace Representation

Ziwei Li
Gengyu Lyu
Songhe Feng

Partial Multi-Label Learning (PML) aims to learn from the training data where each instance is associated with a set of candidate labels, among which only a part of them are relevant. Existing PML methods mainly focus on label disambiguation, while they lack the consideration of noise in the feature space. To tackle the problem, we propose a novel framework named partial multi-label learning via MUlti-SubspacE Representation (MUSER), where the redundant labels together with noisy features are jointly taken into consideration during the training process. Specifically, we first decompose the original label space into a latent label subspace and a label correlation matrix to reduce the negative effects of redundant labels, then we utilize the correlations among features to project the original noisy feature space to a feature subspace to resist the noisy feature information. Afterwards, we introduce a graph Laplacian regularization to constrain the label subspace to keep intrinsic structure among features and impose an orthogonality constraint on the correlations among features to guarantee discriminability of the feature subspace. Extensive experiments conducted on various datasets demonstrate the superiority of our proposed method.

PDF Details DOI