Arrow Research search

Author name cluster

Yun Zhang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

25 papers
2 author rows

Possible papers

25

ICLR Conference 2025 Conference Paper

AugKD: Ingenious Augmentations Empower Knowledge Distillation for Image Super-Resolution

  • Yun Zhang
  • Wei Li 0002
  • Simiao Li
  • Hanting Chen
  • Zhijun Tu
  • Bingyi Jing
  • Shaohui Lin
  • Jie Hu 0021

Knowledge distillation (KD) compresses deep neural networks by transferring task-related knowledge from cumbersome pre-trained teacher models to more compact student models. However, vanilla KD for image super-resolution (SR) networks yields only limited improvements due to the inherent nature of SR tasks, where the outputs of teacher models are noisy approximations of high-quality label images. In this work, we show that the potential of vanilla KD has been underestimated and demonstrate that the ingenious application of data augmentation methods can close the gap between it and more complex, well-designed methods. Unlike conventional training processes typically applying image augmentations simultaneously to both low-quality inputs and high-quality labels, we propose AugKD utilizing unpaired data augmentations to 1) generate auxiliary distillation samples and 2) impose label consistency regularization. Comprehensive experiments show that the AugKD significantly outperforms existing state-of-the-art KD methods across a range of SR tasks.

NeurIPS Conference 2025 Conference Paper

AutoVLA: A Vision-Language-Action Model for End-to-End Autonomous Driving with Adaptive Reasoning and Reinforcement Fine-Tuning

  • Zewei Zhou
  • Tianhui Cai
  • Seth Zhao
  • Yun Zhang
  • Zhiyu Huang
  • Bolei Zhou
  • Jiaqi Ma

Recent advancements in Vision-Language-Action (VLA) models have shown promise for end-to-end autonomous driving by leveraging world knowledge and reasoning capabilities. However, current VLA models often struggle with physically infeasible action outputs, complex model structures, or unnecessarily long reasoning. In this paper, we propose AutoVLA, a novel VLA model that unifies reasoning and action generation within a single autoregressive generation model for end-to-end autonomous driving. AutoVLA performs semantic reasoning and trajectory planning directly from raw visual inputs and language instructions. We tokenize continuous trajectories into discrete, feasible actions, enabling direct integration into the language model. For training, we employ supervised fine-tuning to equip the model with dual thinking modes: fast thinking (trajectory-only) and slow thinking (enhanced with chain-of-thought reasoning). To further enhance planning performance and efficiency, we introduce a reinforcement fine-tuning method based on Group Relative Policy Optimization (GRPO), reducing unnecessary reasoning in straightforward scenarios. Extensive experiments across real-world and simulated datasets and benchmarks, including nuPlan, nuScenes, Waymo, and CARLA, demonstrate the competitive performance of AutoVLA in both open-loop and closed-loop settings. Qualitative results showcase the adaptive reasoning and accurate planning capabilities of AutoVLA in diverse scenarios.

ICLR Conference 2025 Conference Paper

CBQ: Cross-Block Quantization for Large Language Models

  • Xin Ding
  • Xiaoyu Liu 0006
  • Zhijun Tu
  • Yun Zhang
  • Wei Li 0002
  • Jie Hu 0021
  • Hanting Chen
  • Yehui Tang 0001

Post-training quantization (PTQ) has played a pivotal role in compressing large language models (LLMs) at ultra-low costs. Although current PTQ methods have achieved promising results by addressing outliers and employing layer- or block-wise loss optimization techniques, they still suffer from significant performance degradation at ultra-low bits precision. To dissect this issue, we conducted an in-depth analysis of quantization errors specific to LLMs and surprisingly discovered that, unlike traditional sources of quantization errors, the growing number of model parameters, combined with the reduction in quantization bits, intensifies inter-layer and intra-layer dependencies, which severely impact quantization accuracy. This finding highlights a critical challenge in quantizing LLMs. To address this, we propose CBQ, a cross-block reconstruction-based PTQ method for LLMs. CBQ leverages a cross-block dependency to establish long-range dependencies across multiple blocks and integrates an adaptive LoRA-Rounding technique to manage intra-layer dependencies. To further enhance performance, CBQ incorporates a coarse-to-fine pre-processing mechanism for processing weights and activations. Extensive experiments show that CBQ achieves superior low-bit quantization (W4A4, W4A8, W2A16) and outperforms existing state-of-the-art methods across various LLMs and datasets. Notably, CBQ only takes 4.3 hours to quantize a weight-only quantization of a 4-bit LLAMA1-65B model, achieving a commendable trade off between performance and efficiency.

ICLR Conference 2025 Conference Paper

Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

  • Simiao Li
  • Yun Zhang
  • Wei Li 0002
  • Hanting Chen
  • Wenjia Wang
  • Bingyi Jing
  • Shaohui Lin
  • Jie Hu 0021

Knowledge distillation (KD) is a promising yet challenging model compression approach that transmits rich learning representations from robust but resource-demanding teacher models to efficient student models. Previous methods for image super-resolution (SR) are often tailored to specific teacher-student architectures, limiting their potential for improvement and hindering broader applications. This work presents a novel KD framework for SR models, the multi-granularity Mixture of Priors Knowledge Distillation (MiPKD), which can be universally applied to a wide range of architectures at both feature and block levels. The teacher’s knowledge is effectively integrated with the student's feature via the Feature Prior Mixer, and the reconstructed feature propagates dynamically in the training phase with the Block Prior Mixer. Extensive experiments illustrate the significance of the proposed MiPKD technique.

NeurIPS Conference 2025 Conference Paper

PANTHER: Generative Pretraining Beyond Language for Sequential User Behavior Modeling

  • Guilin Li
  • Yun Zhang
  • Xiuyuan Chen
  • CHENGQI LI
  • Bo Wang
  • Linghe Kong
  • Wenjia Wang
  • Weiran Huang

Large language models (LLMs) have shown that generative pretraining can distill vast world knowledge into compact token representations. While LLMs encapsulate extensive world knowledge, they remain limited in modeling the behavioral knowledge contained within user interaction histories. User behavior forms a distinct modality, where each action—defined by multi-dimensional attributes such as time, context, and transaction type—constitutes a behavioral token. Modeling these high-cardinality, sparse, and irregular sequences is challenging, and discriminative models often falter under limited supervision. To bridge this gap, we extend generative pretraining to user behavior, learning transferable representations from unlabeled behavioral data analogous to how LLMs learn from text. We present PANTHER, a hybrid generative–discriminative framework that unifies user behavior pretraining and downstream adaptation, enabling large-scale sequential user representation learning and real-time inference. PANTHER introduces: (1) Structured Tokenization to compress multi-dimensional transaction attributes into an interpretable vocabulary; (2) Sequence Pattern Recognition Module (SPRM) for modeling periodic transaction motifs; (3) a Unified User-Profile Embedding that fuses static demographics with dynamic transaction histories, enabling both personalized predictions and population-level knowledge transfer; and (4) Real-time scalability enabled by offline caching of pre-trained embeddings for millisecond-level inference. Fully deployed and operational online at WeChat Pay, PANTHER delivers a 25. 6\% boost in next-transaction prediction HitRate@1 and a 38. 6\% relative improvement in fraud detection recall over baselines. Cross-domain evaluations on public benchmarks (CCT, MBD, MovieLens-1M, Yelp) show strong generalization, achieving up to 21\% HitRate@1 gains over transformer baselines, establishing PANTHER as a scalable, high-performance framework for industrial user sequential behavior modeling.

IROS Conference 2025 Conference Paper

PC-SRIF: Preconditioned Cholesky-based Square Root Information Filter for Vision-aided Inertial Navigation

  • Tong Ke
  • Parth Agrawal
  • Yun Zhang
  • Weikun Zhen
  • Chao X. Guo
  • Toby Sharp
  • Ryan C. DuToit

In this paper, we introduce a novel estimator for vision-aided inertial navigation systems (VINS), the Preconditioned Cholesky-based Square Root Information Filter (PC-SRIF). When solving linear systems, employing Cholesky decomposition offers superior efficiency but can compromise numerical stability. Due to this, existing VINS utilizing (Square Root) Information Filters often opt for QR decomposition on platforms where single precision is preferred, avoiding the numerical challenges associated with Cholesky decomposition. While these issues are often attributed to the ill-conditioned information matrix in VINS, our analysis reveals that this is not an inherent property of VINS but rather a consequence of specific parameterizations. We identify several factors that contribute to an ill-conditioned information matrix and propose a preconditioning technique to mitigate these conditioning issues. Building on this analysis, we present PC-SRIF, which exhibits remarkable stability in performing Cholesky decomposition in single precision when solving linear systems in VINS. Consequently, PC-SRIF achieves superior theoretical efficiency compared to alternative estimators. To validate the efficiency advantages and numerical stability of PC-SRIF based VINS, we have conducted well controlled experiments, which provide empirical evidence in support of our theoretical findings. Remarkably, in our VINS implementation, PC-SRIF’s runtime is 41% faster than QR-based SRIF.

AILAW Journal 2023 Journal Article

Methods of incorporating common element characteristics for law article prediction

  • Yifan Hou
  • Ge Cheng
  • Yun Zhang
  • Dongliang Zhang

Abstract Law article prediction is a task of predicting the relevant laws and regulations involved in a case according to the description text of the case, and it has broad application prospects in improving judicial efficiency. In the existing research work, researchers often only consider a single case, employing the neural network method to extract features for prediction, which lack the mining of related and common element information between different data. In order to solve this problem, we propose a law article prediction method that integrates the characteristics of common elements. It can effectively utilize the co-occurrence information of the training data, fully mine the relevant common elements between cases, and fuse local features. Experiments show that our method performs well.

JBHI Journal 2021 Journal Article

Attribute Supervised Probabilistic Dependent Matrix Tri-Factorization Model for the Prediction of Adverse Drug-Drug Interaction

  • Jiajing Zhu
  • Yongguo Liu
  • Yun Zhang
  • Dongxiao Li

Adverse drug-drug interaction (ADDI) becomes a significant threat to public health. Despite the detection of ADDIs is experimentally implemented in the early development phase of drug design, many potential ADDIs are still clinically explored by accidents, leading to a large number of morbidity and mortality. Several computational models are designed for ADDI prediction. However, they take no consideration of drug dependency, although many drugs usually produce synergistic effects and own highly mutual dependency in treatments, which contains underlying information about ADDIs and benefits ADDI prediction. In this paper, we design a dependent network to model the drug dependency and propose an attribute supervised learning model Probabilistic Dependent Matrix Tri-Factorization (PDMTF) for ADDI prediction. In particular, PDMTF incorporates two drug attributes, molecular structure and side effect, and their correlation to model the adverse interactions among drugs. The dependent network is represented by a dependent matrix, which is first formulated by the row precision matrix of the predicted attribute matrices and then regularized by the molecular structure similarities among drugs. Meanwhile, an efficient alternating algorithm is designed for solving the optimization problem of PDMTF. Experiments demonstrate the superior performance of the proposed model when compared with eight baselines and its two variants.

JBHI Journal 2017 Journal Article

Order Statistics Concordance Coefficient With Applications to Multichannel Biosignal Analysis

  • Weichao Xu
  • Zhaoguo Chen
  • Yun Zhang
  • Lianglun Cheng

In this paper, we propose a novel concordance coefficient, called order statistics concordance coefficient (OSCOC), to quantify the association among multichannel biosignals. To uncover its properties, we compare OSCOC with three other similar indexes, i. e. , average Pearson's product moment correlation coefficient (APPMCC), Kendall's concordance coefficients (KCC), and average Kendall's tau (AKT), under a multivariate normal model (MNM), linear model (LM), and nonlinear model. To further demonstrate its usefulness, we present an example on atrial arrhythmia analysis based on real-world multichannel cardiac signals. Theoretical derivations as well as numerical results suggest that 1) under MNM and LM, OSCOC performs equally well with APPMCC, and outperforms the other two methods, 2) in nonlinear case, OSCOC even has better performance than KCC and AKT, which are well known to be robust under increasing nonlinear transformations, and 3) OSCOC performs the best in the case study of arrhythmia analysis in terms of the volume under the surface.