Arrow Research search

Author name cluster

Qingqing Ye

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

9 papers
1 author row

Possible papers

9

AAAI Conference 2026 Conference Paper

Class-feature Watermark: A Resilient Black-box Watermark Against Model Extraction Attacks

  • Yaxin Xiao
  • Qingqing Ye
  • Zi Liang
  • Haoyang Li
  • Ronghua Li
  • Huadi Zheng
  • Haibo Hu

Machine learning models constitute valuable intellectual property, yet remain vulnerable to model extraction attacks (MEA), where adversaries replicate their functionality through black-box queries. Model watermarking counters MEAs by embedding forensic markers for ownership verification. Current black-box watermarks prioritize MEA survival through representation entanglement, yet inadequately explore resilience against sequential MEAs and removal attacks. Our study reveals that this risk is underestimated because existing removal methods are weakened by entanglement. To address this gap, we propose Watermark Removal attacK (WRK), which circumvents entanglement constraints by exploiting decision boundaries shaped by prevailing sample-level watermark artifacts. WRK effectively reduces watermark success rates by ≥88.79% across existing watermarking benchmarks. For robust protection, we propose Class-Feature Watermarks (CFW), which improve resilience by leveraging class-level artifacts. CFW constructs a synthetic class using out-of-domain samples, eliminating vulnerable decision boundaries between original domain samples and their artifact-modified counterparts (watermark samples). CFW concurrently optimizes both MEA transferability and post-MEA stability. Experiments across multiple domains show that CFW consistently outperforms prior methods in resilience, maintaining a watermark success rate of ≥70.15% in extracted models even under the combined MEA and WRK distortion, while preserving the utility of protected models.

AAAI Conference 2026 Conference Paper

DIFT: Protecting Contrastive Learning Against Data Poisoning Backdoor Attacks

  • Jiang Zhu
  • Yulin Jin
  • Qingqing Ye
  • Zhibiao Guo
  • Kun Fang
  • Ruochen Du
  • Yingnan Zhao
  • Haibo Hu

Contrastive learning (CL) is a popular learning paradigm that excels in extracting meaningful representations from unlabeled data. Recent studies have shown that CL is highly vulnerable to backdoor attacks. Current defenses against backdoor attacks in CL are primarily reactive and post-training. That is, the detection and elimination of backdoors are executed in the deployment phase of a given well-trained model. However, these post-training defenses are usually prone to degrading model utility and resource-intensive, causing that the backdoor detection and elimination from a fully-trained model is quite challenging. To address this issue, we argue for a fundamental perspective, i.e., integrating the defense into the model's training phase, and propose a novel framework to mitigate the backdoor in CL, namely Density-Based Identification and Fine-Tuning (DIFT). Specifically, DIFT identifies potential poisoned samples during the early training phase via detecting embeddings with abnormal poisoning characteristic in the feature space. Then, to remove backdoors and preserve model utility, the detected poisoned samples are leveraged to fine-tune the model, and the remaining clean samples are further involved into training the model after the fine-tuning. DIFT, as a proactive training-time defense, avoids the problematic backdoor removal and the high computational cost associated with those reactive post-training methods. We empirically evaluate DIFT on various CL algorithms against backdoor attack. Experimental results demonstrate that our method exhibits promising defense effectiveness while maintaining model's clean data accuracy.

AAAI Conference 2026 Conference Paper

How Much Do Large Language Model Cheat on Evaluation? Benchmarking Overestimation Under the One-Time-Pad-Based Framework

  • Zi Liang
  • Liantong Yu
  • Zhang Shiyu
  • Qingqing Ye
  • Haibo Hu

Overestimation in evaluating large language models (LLMs) has become an increasing concern. Due to the contamination of public benchmarks or imbalanced model training, LLMs may achieve unreal evaluation results on public benchmarks, either intentionally or unintentionally, which leads to unfair comparisons among LLMs and undermines their realistic capability assessments. Existing benchmarks attempt to address these issues by keeping test cases permanently secret, mitigating contamination through human evaluation, or repeatedly collecting and constructing new samples. However, these approaches fail to ensure reproducibility, transparency, and high efficiency simultaneously. Moreover, the extent of overestimation in current LLMs remains unquantified. To address these issues, we propose ArxivRoll, a dynamic evaluation framework inspired by one-time pad encryption in cryptography. ArxivRoll comprises two key components: i) SCP (Sequencing, Cloze, and Prediction), an automated generator for private test cases, and ii) Rugged Scores (RS), metrics that measure the proportion of public benchmark contamination and training bias. Leveraging SCP, ArxivRoll constructs a new benchmark every six months using recent articles from ArXiv and employs them for one-time evaluations of LLM performance. Extensive experiments demonstrate the high quality of our benchmark, and we provide a systematic evaluation of current LLMs.

TMLR Journal 2026 Journal Article

Incorporating New Knowledge into Federated Learning: Advances, Insights, and Future Directions

  • Lixu Wang
  • Sun Yinggang
  • Yang Zhao
  • Jiaqi Wu
  • Jiahua Dong
  • Ating Yin
  • Qinbin Li
  • Qingqing Ye

Federated Learning (FL) is a distributed learning approach that allows participants to collaboratively train machine learning models without sharing the raw data. It is rapidly developing in an era where privacy protection is increasingly valued. It is this rapid development trend, along with the continuous emergence of new demands for FL in the real world, that prompts us to focus on a very important problem: How to Incorporate New Knowledge into Federated Learning? The primary challenge here is to effectively and timely incorporate various new knowledge into existing FL systems and evolve these systems to reduce costs, upgrade functionalities, and facilitate sustainable development. In the meantime, established FL systems should preserve existing functionalities during the incorporation of new knowledge. In this paper, we systematically define the main sources of new knowledge in FL, including new features, tasks, models, and algorithms. For each source, we thoroughly analyze and discuss the technical approaches for incorporating new knowledge into existing FL systems and examine the impact of the form and timing of new knowledge arrival on the incorporation process. Unlike prior surveys that primarily catalogue FL techniques under a fixed system specification, we adopt a lifecycle evolution perspective and synthesize methods that enable time-varying integration of new features, tasks, models, and aggregation algorithms while preserving existing functionality. Furthermore, we comprehensively discuss the potential future directions for FL, incorporating new knowledge and considering a variety of factors, including scenario setups, security and privacy threats, and incentives.

AAAI Conference 2025 Conference Paper

A Sample-Level Evaluation and Generative Framework for Model Inversion Attacks

  • Haoyang Li
  • Li Bai
  • Qingqing Ye
  • Haibo Hu
  • Yaxin Xiao
  • Huadi Zheng
  • Jianliang Xu

Model Inversion (MI) attacks, which reconstruct the training dataset of neural networks, pose significant privacy concerns in machine learning. Recent MI attacks have managed to reconstruct realistic label-level private data, such as the general appearance of a target person from all training images labeled on him. Beyond label-level privacy, in this paper we show sample-level privacy, the private information of a single target sample, is also important but under-explored in the MI literature due to the limitations of existing evaluation metrics. To address this gap, this study introduces a novel metric tailored for training-sample analysis, namely, the Diversity and Distance Composite Score (DDCS), which evaluates the reconstruction fidelity of each training sample by encompassing various MI attack attributes. This, in turn, enhances the precision of sample-level privacy assessments. Leveraging DDCS as a new evaluative lens, we observe that many training samples remain resilient against even the most advanced MI attack. As such, we further propose a transfer learning framework that augments the generative capabilities of MI attackers through the integration of entropy loss and natural gradient descent. Extensive experiments verify the effectiveness of our framework on improving state-of-the-art MI attacks over various metrics including DDCS, coverage and FID. Finally, we demonstrate that DDCS can also be useful for MI defense, by identifying samples susceptible to MI attacks in an unsupervised manner.

AAAI Conference 2025 Conference Paper

Exploring Intrinsic Alignments Within Text Corpus

  • Zi Liang
  • Pinghui Wang
  • Ruofei Zhang
  • Haibo Hu
  • Shuo Zhang
  • Qingqing Ye
  • Nuo Xu
  • Yaxin Xiao

Recent years have witnessed rapid advancements in the safety alignments of large language models (LLMs). Methods such as supervised instruction fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) have thus emerged as vital components in constructing LLMs. While these methods achieve robust and fine-grained alignment to human values, their practical application is still hindered by high annotation costs and incomplete human alignments. Besides, the intrinsic human values within training corpora have not been fully exploited. To address these issues, we propose ISAAC (Intrinsically Supervised Alignments by Assessing Corpus), a primary and coarse-grained safety alignment strategy for LLMs. ISAAC only relies on a prior assumption about the text corpus, and does not require preferences in RLHF or human responses selection in SFT. Specifically, it assumes a long-tail distribution of text corpus and employs a specialized sampling strategy to automatically sample high-quality responses. Theoretically, we prove that this strategy can improve the safety of LLMs under our assumptions. Empirically, our evaluations on mainstream LLMs show that ISAAC achieves a safety score comparable to current SFT solutions. Moreover, we conduct experiments on ISAAC for some RLHF-based LLMs, where we find that ISAAC can even improve the safety of these models under specific safety domains. These findings demonstrate that ISAAC can provide preliminary alignment to LLMs, thereby reducing the construction costs of existing human-feedback-based methods.

NeurIPS Conference 2025 Conference Paper

Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts

  • Li Bai
  • Qingqing Ye
  • Xinwei Zhang
  • Sen Zhang
  • Zi Liang
  • Jianliang Xu
  • Haibo Hu

Machine learning models are often vulnerable to inference attacks that expose sensitive information from their training data. Shadow model technique is commonly employed in such attacks, like membership inference. However, the need for a large number of shadow models leads to high computational costs, limiting their practical applicability. Such inefficiency mainly stems from the independent training and use of these shadow models. To address this issue, we present a novel shadow pool training framework SHAPOOL, which constructs multiple shared models and trains them jointly within a single process. In particular, we leverage the Mixture-of-Experts mechanism as the shadow pool to interconnect individual models, enabling them to share some sub-networks and thereby improving efficiency. To ensure the shared models closely resemble independent models and serve as effective substitutes, we introduce three novel modules: path-choice routing, pathway regularization, and pathway alignment. These modules guarantee random data allocation for pathway learning, promote diversity among shared models, and maintain consistency with target models. We evaluate SHAPOOL in the context of various membership inference attacks and show that it significantly reduces the computational cost of shadow model construction while maintaining comparable attack performance.

NeurIPS Conference 2025 Conference Paper

Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data

  • Zi Liang
  • Qingqing Ye
  • Xuan Liu
  • Yanyun Wang
  • Jianliang Xu
  • Haibo Hu

Synthetic data refers to artificial samples generated by models. While it has been validated to significantly enhance the performance of large language models (LLMs) during training and has been widely adopted in LLM development, potential security risks it may introduce remain uninvestigated. This paper systematically evaluates the resilience of synthetic-data-integrated training paradigm for LLMs against mainstream poisoning and backdoor attacks. We reveal that such a paradigm exhibits strong resistance to existing attacks, primarily thanks to the different distribution patterns between poisoning data and queries used to generate synthetic samples. To enhance the effectiveness of these attacks and further investigate the security risks introduced by synthetic data, we introduce a novel and universal attack framework, namely, Virus Infection Attack (VIA), which enables the propagation of current attacks through synthetic data even under purely clean queries. Inspired by the principles of virus design in cybersecurity, VIA conceals the poisoning payload within a protective “shell” and strategically searches for optimal hijacking points in benign samples to maximize the likelihood of generating malicious content. Extensive experiments on both data poisoning and backdoor attacks show that VIA significantly increases the presence of poisoning content in synthetic data and correspondingly raises the attack success rate (ASR) on downstream models to levels comparable to those observed in the poisoned upstream models.

NeurIPS Conference 2022 Conference Paper

MExMI: Pool-based Active Model Extraction Crossover Membership Inference

  • Yaxin Xiao
  • Qingqing Ye
  • Haibo Hu
  • Huadi Zheng
  • Chengfang Fang
  • Jie Shi

With increasing popularity of Machine Learning as a Service (MLaaS), ML models trained from public and proprietary data are deployed in the cloud and deliver prediction services to users. However, as the prediction API becomes a new attack surface, growing concerns have arisen on the confidentiality of ML models. Existing literatures show their vulnerability under model extraction (ME) attacks, while their private training data is vulnerable to another type of attacks, namely, membership inference (MI). In this paper, we show that ME and MI can reinforce each other through a chained and iterative reaction, which can significantly boost ME attack accuracy and improve MI by saving the query cost. As such, we build a framework MExMI for pool-based active model extraction (PAME) to exploit MI through three modules: “MI Pre-Filter”, “MI Post-Filter”, and “semi-supervised boosting”. Experimental results show that MExMI can improve up to 11. 14% from the best known PAME attack and reach 94. 07% fidelity with only 16k queries. Furthermore, the precision and recall of the MI attack in MExMI are on par with state-of-the-art MI attack which needs 150k queries.