Author name cluster

Zhen Peng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

1 author row

NeurIPS Conference 2025 Conference Paper

LoTA-QAF: Lossless Ternary Adaptation for Quantization-Aware Fine-Tuning

Junyu Chen
Junzhuo Li
Zhen Peng
Wenjie Wang
Yuxiang Ren
Long Shi
Xuming Hu

Quantization and fine-tuning are crucial for deploying large language models (LLMs) on resource-constrained edge devices. However, fine-tuning quantized models presents significant challenges, primarily stemming from: First, the mismatch in data types between the low-precision quantized weights (e. g. , 4-bit) and the high-precision adaptation weights (e. g. , 16-bit). This mismatch limits the computational efficiency advantage offered by quantized weights during inference. Second, potential accuracy degradation when merging these high-precision adaptation weights into the low-precision quantized weights, as the adaptation weights often necessitate approximation or truncation. Third, as far as we know, no existing methods support the lossless merging of adaptation while adjusting all quantized weights. To address these challenges, we introduce lossless ternary adaptation for quantization-aware fine-tuning (LoTA-QAF). This is a novel fine-tuning method specifically designed for quantized LLMs, enabling the lossless merging of ternary adaptation weights into quantized weights and the adjustment of all quantized weights. LoTA-QAF operates through a combination of: i) A custom-designed ternary adaptation (TA) that aligns ternary weights with the quantization grid and uses these ternary weights to adjust quantized weights. ii) A TA-based mechanism that enables the lossless merging of adaptation weights. iii) Ternary signed gradient descent (t-SignSGD) for updating the TA weights. We apply LoTA-QAF to Llama-3. 1/3. 3 and Qwen-2. 5 model families and validate its effectiveness on several downstream tasks. On the MMLU benchmark, our method effectively recovers performance for quantized models, surpassing 16-bit LoRA by up to 5. 14\%. For task-specific fine-tuning, 16-bit LoRA achieves superior results, but LoTA-QAF still outperforms other methods. Code is available in github. com/KingdalfGoodman/LoTA-QAF.

PDF Details

AAAI Conference 2025 Conference Paper

Out-of-Distribution Generalization on Graphs via Progressive Inference

Yiming Xu
Bin Shi
Zhen Peng
Huixiang Liu
Bo Dong
Chen Chen

The development and evaluation of graph neural networks (GNNs) generally follow the independent and identically distributed (i.i.d.) assumption. Yet this assumption is often untenable in practice due to the uncontrollable data generation mechanism. In particular, when the data distribution shows a significant shift, most GNNs would fail to produce reliable predictions and may even make decisions randomly. One of the most promising solutions to improve the model generalization is to pick out causal invariant parts in the input graph. Nonetheless, we observe a significant distribution gap between the causal parts learned by existing methods and the ground-truth, leading to undesirable performance. In response to the above issues, this paper presents GPro, a model that learns graph causal invariance with progressive inference. Specifically, the complicated graph causal invariant learning is decomposed into multiple intermediate inference steps from easy to hard, and the perception of GPro is continuously strengthened through a progressive inference process to extract causal features that are stable to distribution shifts. We also enlarge the training distribution by creating counterfactual samples to enhance the capability of the GPro in capturing the causal invariant parts. Extensive experiments demonstrate that our proposed GPro outperforms the state-of-the-art methods by 4.91% on average. For datasets with more severe distribution shifts, the performance improvement can be up to 6.86%.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Revisiting Graph Contrastive Learning on Anomaly Detection: A Structural Imbalance Perspective

Yiming Xu
Zhen Peng
Bin Shi
Xu Hua
Bo Dong
Song Wang
Chen Chen

The superiority of graph contrastive learning (GCL) has prompted its application to anomaly detection tasks for more powerful risk warning systems. Unfortunately, existing GCL-based models tend to excessively prioritize overall detection performance while neglecting robustness to structural imbalance, which can be problematic for many real-world networks following power-law degree distributions. Particularly, GCL-based methods may fail to capture tail anomalies (abnormal nodes with low degrees). This raises concerns about the security and robustness of current anomaly detection algorithms and therefore hinders their applicability in a variety of realistic high-risk scenarios. To the best of our knowledge, research on the robustness of graph anomaly detection to structural imbalance has received little scrutiny. To address the above issues, this paper presents a novel GCL-based framework named AD-GCL. It devises the neighbor pruning strategy to filter noisy edges for head nodes and facilitate the detection of genuine tail nodes by aligning from head nodes to forged tail nodes. Moreover, AD-GCL actively explores potential neighbors to enlarge the receptive field of tail nodes through anomaly-guided neighbor completion. We further introduce intra- and inter-view consistency loss of the original and augmentation graph for enhanced representation. The performance evaluation of the whole, head, and tail nodes on multiple datasets validates the comprehensive superiority of the proposed AD-GCL in detecting both head anomalies and tail anomalies.

PDF Details DOI

EAAI Journal 2024 Journal Article

Growth threshold for pseudo labeling and pseudo label dropout for semi-supervised medical image classification

Shaofeng Zhou
Shengwei Tian
Long Yu
Weidong Wu
Dezhi Zhang
Zhen Peng
Zhicheng Zhou

Semi-supervised learning (SSL) provides methods to improve model performance through unlabeled samples. In medical image analysis, the challenges of multi-category classification and imbalance learning must be addressed effectively. Pseudo labeling is not specifically designed for multi-category and category imbalance problems. In this paper, we propose the Growth Threshold for Pseudo Labeling (GTPL) and Pseudo Label Dropout (PLD), which can be used separately or in combination. GTPL changes the threshold value of each category by combining the confidence of labeled and unlabeled samples. PLD alleviates the category imbalance by randomly discarding some of the pseudo labels. We apply GTPL and PLD to FixMatch and CoMatch and effectively improve their semi-supervised classification performance. We validate the effectiveness of our approach in skin lesion diagnosis on two long-tailed distributions of public medical images on the ISIC 2018 and ISIC 2019 challenge datasets, obtaining AUCs of 89. 19%, 92. 71%, 94. 71%, and 94. 76%, respectively, on four scales of labeled data from ISIC 2018.

Details DOI

IJCAI Conference 2018 Conference Paper

ANOMALOUS: A Joint Modeling Approach for Anomaly Detection on Attributed Networks

Zhen Peng
Minnan Luo
Jundong Li
Huan Liu
Qinghua Zheng

The key point of anomaly detection on attributed networks lies in the seamless integration of network structure information and attribute information. A vast majority of existing works are mainly based on the Homophily assumption that implies the nodal attribute similarity of connected nodes. Nonetheless, this assumption is untenable in practice as the existence of noisy and structurally irrelevant attributes may adversely affect the anomaly detection performance. Despite the fact that recent attempts perform subspace selection to address this issue, these algorithms treat subspace selection and anomaly detection as two separate steps which often leads to suboptimal solutions. In this paper, we investigate how to fuse attribute and network structure information more synergistically to avoid the adverse effects brought by noisy and structurally irrelevant attributes. Methodologically, we propose a novel joint framework to conduct attribute selection and anomaly detection as a whole based on CUR decomposition and residual analysis. By filtering out noisy and irrelevant node attributes, we perform anomaly detection with the remaining representative attributes. Experimental results on both synthetic and real-world datasets corroborate the effectiveness of the proposed framework.

PDF Details