Author name cluster

Su Zhao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Compress Large Language Models via Collaboration Between Learning and Matrix Approximation

Yuesen Liao
Zhiwei Li
Binrui Wu
Zihao Cheng
Su Zhao
Shuai Chen
Weizhong Zhang

Sparse and low-rank matrix composite approximation has emerged as a promising paradigm for compressing large language models (LLMs), offering a more flexible pruning structure than conventional methods based solely on sparse matrices. The significant variation in weight redundancy across layers, along with the differing rank and sparsity structures of weight matrices, makes identifying the globally optimal pruning structure extremely challenging. Existing methods often depend on uniform or manually designed heuristic rules to allocate weight sparsity across layers, subsequently compressing each matrix using matrix approximation techniques. Given the above theoretical difficulty in global compression of LLMs and the limited computational and data resources available compared to the training phase, we argue that a collaboration between learning and matrix approximation is essential for effective compression. In this paper, we propose a novel LLM compression framework based on generalized bilevel optimization that naturally formulates an effective collaborative mechanism. Specifically, the outer loop frames the weight allocation task as a probabilistic optimization problem, enabling the automatic learning of both layer-wise sparsities and matrix-wise retained ranks, while the inner loop solves the corresponding sparsity and rank-constrained model compression problem via matrix approximation. Our main technical contributions include two key innovations for efficiently solving this bilevel optimization problem. First, we introduce a truncated Gaussian prior-based probabilistic parameterization integrated with a policy gradient estimator, which avoids expensive backpropagation and stabilizes the optimization process. Second, we design an adapted QR-based matrix approximation algorithm that significantly accelerates inner loop computations. Extensive experiments on Phi-3 and the LLama-2/3 family demonstrate the effectiveness of our method. Notably, it maintains over 95\% zero-shot accuracy under 50\% sparsity and achieves up to 2× inference speedup.

PDF Details

NeurIPS Conference 2025 Conference Paper

Efficient Representativeness-Aware Coreset Selection

Zihao Cheng
Binrui Wu
Zhiwei Li
Yuesen Liao
Su Zhao
Shuai Chen
Yuan Gao
Weizhong Zhang

Dynamic coreset selection is a promising approach for improving the training efficiency of deep neural networks by periodically selecting a small subset of the most representative or informative samples, thereby avoiding the need to train on the entire dataset. However, it remains inherently challenging due not only to the complex interdependencies among samples and the evolving nature of model training, but also to a critical coreset representativeness degradation issue identified and explored in-depth in this paper, that is, the representativeness or information content of the coreset degrades over time as training progresses. Therefore, we argue that, in addition to designing accurate selection rules, it is equally important to endow the algorithms with the ability to assess the quality of the current coreset. Such awareness enables timely re-selection, mitigating the risk of overfitting to stale subsets—a limitation often overlooked by existing methods. To this end, this paper proposes an E fficient R epresentativeness- A ware C oreset S election method for deep neural networks, a lightweight framework that enables dynamic tracking and maintenance of coreset quality during training. While the ideal criterion—gradient discrepancy between the coreset and the full dataset—is computationally prohibitive, we introduce a scalable surrogate based on the signal-to-noise ratio (SNR) of gradients within the coreset, which is the main technical contribution of this paper and is also supported by our theoretical analysis. Intuitively, a decline in SNR indicates overfitting to the subset and declining representativeness. Leveraging this observation, our method triggers coreset updates without requiring costly Hessian or full-batch gradient computations, maintaining minimal computational overhead. Experiments on multiple datasets confirm the effectiveness of our approach. Notably, compared with existing gradient-based dynamic coreset selection baselines, our method achieves up to a 5. 4\% improvement in test accuracy across multiple datasets.

PDF Details

ICRA Conference 2012 Conference Paper

A compact 3-DOF compliant serial mechanism for trajectory tracking with flexures made by rapid prototyping

Su Zhao
Yan Naing Aye
Cheng Yap Shee
I-Ming Chen 0001
Wei Tech Ang

To fulfill the needs for accurate trajectory tracking with large displacement in a handheld instrument, a 3-DOF serial compliant mechanism is developed. The mechanism is compact with a total length less than 150 mm and a maximum diameter of 22 mm. Two flexures are developed using different rapid prototyping techniques: one 3-DOF flexural lever made of Vero-Gray by Polyjet and a 1-DOF translational flexure made of stainless steel by Direct Metal Laser Sintering (DMLS). Analytical and Finite Element (FE) models are developed for the proposed flexural mechanisms. Experiments are conducted on a prototype. To improve the tracking accuracy, the hysteretic nonlinearities of the system are modeled using Prandtl-Ishlinskii model. Inverse feedforward controller is implemented to linearize the relationship between input and output. The tracking errors are reduced while maintaining a fast response of the system. The total tracking errors are identified individually for each axis and then compensated. Tracking performances of the tool tip are evaluated experimentally with different inputs. The RMS tracking error of the proposed mechanism is lower than 1 µm in all axes, which is improved more than four times compared to the previous systems.

Details