Arrow Research search

Author name cluster

Pin-Yu Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

137 papers
2 author rows

Possible papers

137

TMLR Journal 2026 Journal Article

ADAPT: Adaptive Prompt Tuning for Vision-Language Models

  • Zhenhan Huang
  • Tejaswini Pedapati
  • Pin-Yu Chen
  • Jianxi Gao

Prompt tuning has emerged as an effective way for parameter-efficient fine-tuning. Conventional deep prompt tuning inserts continuous prompts of a fixed context length into the input to each layer. When a pre-trained model is tailored to a specific downstream task, different layers initialized with pre-trained weights might have different levels of deviation from the optimal weights. Inserted prompts with a fixed context length might have redundant context tokens or insufficient context length. To address this issue, we propose a deep continuous prompting method dubbed Adapt that encourages heterogeneous context lengths. In this method, context lengths are automatically determined by iteratively pruning context tokens. We use the saliency criterion for neural network pruning to compute the importance scores of context tokens in order to determine which tokens to prune. To avoid the forgetting issue in the fine-tuning process, we apply the angular knowledge distillation to force the model to learn the angular separation between pairs of classes and that of instances from the pre-trained model. We examine the proposed method on the pre-trained vision-language model CLIP. 16-shot experiments on 11 downstream datasets reveal the advantage of Adapt: the average test accuracy achieves competitive performance, and the highest performance gain on individual datasets is 7.44%. We release the code in https://github.com/Zhenhan-Huang/Adapt-Public.

TMLR Journal 2026 Journal Article

Diversity Boosts AI-Generated Text Detection

  • Advik Raj Basani
  • Pin-Yu Chen

Detecting AI-generated text is an increasing necessity to combat misuse of LLMs in education, business compliance, journalism, and social media, where synthetic fluency can mask misinformation or deception. While prior detectors often rely on token-level likelihoods or opaque black-box classifiers, these approaches struggle against high-quality generations and offer little interpretability. In this work, we propose DivEye, a novel detection framework that captures how unpredictability fluctuates across a text using surprisal-based features. Motivated by the observation that human-authored text exhibits richer variability in lexical and structural unpredictability than LLM outputs, DivEye captures this signal through a set of interpretable statistical features. Our method outperforms existing zero-shot detectors by up to $33.2$% and achieves competitive performance with fine-tuned baselines across multiple benchmarks. DivEye is robust to paraphrasing and adversarial attacks, generalizes well across domains and models, and improves the performance of existing detectors by up to $18.7$% when used as an auxiliary signal. Beyond detection, DivEye provides interpretable insights into why a text is flagged, pointing to rhythmic unpredictability as a powerful and underexplored signal for LLM detection.

AAAI Conference 2026 Conference Paper

MegaCoin: Enhancing Medium-Grained Color Perception for Vision-Language Models

  • Ming-Chang Chiu
  • Shicheng Wen
  • Pin-Yu Chen
  • Xuezhe Ma

In vision-language models (VLMs), the ability to perceive and interpret color and physical environment is crucial for achieving contextually accurate understanding and interaction. However, despite advances in multimodal modeling, there remains a significant lack of specialized datasets that rigorously evaluate a model's capacity to discern subtle color variations and spatial context---critical elements for situational comprehension and reliable deployment across real-world applications. Toward that goal, we curate MegaCoin, a high-quality, human-labeled dataset based on \emph{real} images with various contextual attributes. MegaCoin consists of two parts: MegaCoin-Instruct, which serves as a supervised fine-tuning (SFT) dataset for VLMs; and MegaCoin-Bench, an annotated test set that can be used as a stand-alone QA dataset. MegaCoin provides three annotated features for 220,000 real images: foreground color, background color, and description of an object's physical environment, constituting 660k human annotations. In addition, MegaCoin can be applied to benchmark domain generalization (DG) algorithms. We explore benchmarking DG methods in the linear probing setup for VLM and show some new insights. Last but not least, we show that VLMs, including GPT-4o, have subpar color recognition capabilities, and fine-tuning with MegaCoin can result in improved performance on visual evaluation tasks. In certain cases, MegaCoin fine-tuned small-scale open-source models such as LLaVA and Bunny can outperform closed-source GPT-4o. We hope the utilities of MegaCoin can shed light on the directions VLMs can improve and provide a more complex platform for domain generalization algorithms.

NeurIPS Conference 2025 Conference Paper

Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search

  • Yanbo Wang
  • Zixiang Xu
  • Yue Huang
  • Gao Chujie
  • Siyuan Wu
  • Jiayi Ye
  • Pin-Yu Chen
  • Xiuying Chen

Large Language Models (LLMs) often struggle to maintain their original performance when faced with semantically coherent but task-irrelevant contextual information. Although prior studies have explored this issue using fixed-template or retrieval-based distractions, such static methods show limited effectiveness against contemporary models. To address this problem, we propose a dynamic distraction generation framework based on tree search, where the generation process is guided by model behavior. Without modifying the original question or answer, the method efficiently produces challenging adaptive distractions across multiple datasets, enabling systematic stress testing of LLMs’ contextual robustness. Experiments on four benchmarks demonstrate that the generated distractions lead to an average performance drop of over 45\% for mainstream models. Further comparisons of mitigation strategies show that prompt-based optimization methods yield limited gains, whereas post-training approaches (e. g. , DPO) significantly enhance the model's contextual robustness. The results indicate that these issues do not stem from knowledge deficits in LLMs, but from a fundamental inability to maintain consistent reasoning under contextual distraction, posing a major challenge to the reliability of LLMs in real-world applications.

NeurIPS Conference 2025 Conference Paper

CoP: Agentic Red-teaming for Large Language Models using Composition of Principles

  • Chen Xiong
  • Pin-Yu Chen
  • Tsung-Yi Ho

Recent advances in Large Language Models (LLMs) have spurred transformative applications in various domains, ranging from open-source to proprietary LLMs. However, jailbreak attacks, which aim to break safety alignment and user compliance by tricking the target LLMs into answering harmful and risky responses, are becoming an urgent concern. The practice of red-teaming for LLMs is to proactively explore potential risks and error-prone instances before the release of frontier AI technology. This paper proposes an agentic workflow to automate and scale the red-teaming process of LLMs through the Composition-of-Principles (CoP) framework, where human users provide a set of red-teaming principles as instructions to an AI agent to automatically orchestrate effective red-teaming strategies and generate jailbreak prompts. Distinct from existing red-teaming methods, our CoP framework provides a unified and extensible framework to encompass and orchestrate human-provided red-teaming principles to enable the automated discovery of new red-teaming strategies. When tested against leading LLMs, CoP reveals unprecedented safety risks by finding novel jailbreak prompts and improving the best-known single-turn attack success rate by up to 19. 0 times.

IJCAI Conference 2025 Conference Paper

Differentiable Prompt Learning for Vision Language Models

  • Zhenhan Huang
  • Tejaswini Pedapati
  • Pin-Yu Chen
  • Jianxi Gao

Prompt learning is an effective way to exploit the potential of large-scale pre-trained foundational models. Continuous prompts parameterize context tokens in prompts by turning them into differentiable vectors. Deep continuous prompts insert prompts not only in the input but also in the intermediate hidden representations. Manually designed deep continuous prompts exhibit a remarkable improvement compared to the zero-shot pre-trained model on downstream tasks. How to automate the continuous prompt design is an underexplored area, and a fundamental question arises, is manually designed deep prompt strategy optimal? To answer this question, we propose a method dubbed differentiable prompt learning (DPL). The DPL method is formulated as an optimization problem to automatically determine the optimal context length of the prompt to be added to each layer, where the objective is to maximize the performance. We test the DPL method on the pre-trained CLIP. We empirically find that by using only limited data, our DPL method can find deep continuous prompt configuration with high confidence. The performance on the downstream tasks exhibits the superiority of the automatic design: our method boosts the average test accuracy by 2. 60% on 11 datasets compared to baseline methods. Besides, our method focuses only on the prompt configuration (i. e. context length for each layer), which means that our method is compatible with the baseline methods that have sophisticated designs to boost the performance. We release our code in https: //github. com/Zhenhan-Huang/Differentiable-Prompt-Learn.

TMLR Journal 2025 Journal Article

Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective

  • Sahil Verma
  • Gantavya Bhatt
  • Avi Schwarzschild
  • Soumye Singhal
  • Arnav Mohanty Das
  • Chirag Shah
  • John P Dickerson
  • Pin-Yu Chen

Despite the advanced capabilities of contemporary machine learning (ML) models, they remain vulnerable to adversarial and backdoor attacks. This vulnerability is particularly concerning in real-world deployments, where compromised models may exhibit unpredictable behavior in critical scenarios. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models, as these datasets may harbor backdoors. Various techniques have been proposed to mitigate the effects of backdooring in multimodal models, such as CleanCLIP, which is the current state-of-the-art approach. In this work, we demonstrate that the efficacy of CleanCLIP in mitigating backdoors is highly dependent on the particular objective used during model pre-training. We observe that stronger pre-training objectives that lead to higher zero-shot classification performance correlate with harder to remove backdoors behaviors. We show this by training multimodal models on two large datasets consisting of 3 million (CC3M) and 6 million (CC6M) datapoints, under various pre-training objectives, followed by poison removal using CleanCLIP. We find that CleanCLIP, even with extensive hyperparameter tuning, is ineffective in poison removal when stronger pre-training objectives are used. Our findings underscore critical considerations for ML practitioners who train models using large-scale web-curated data and are concerned about potential backdoor threats.

AAAI Conference 2025 Conference Paper

From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers

  • Bharat Runwal
  • Tejaswini Pedapati
  • Pin-Yu Chen

Pretrained Language Models (PLMs) have become the de facto starting point for fine-tuning on downstream tasks. However, as model sizes continue to increase, traditional fine-tuning of all parameters becomes challenging. To address this, parameter-efficient fine-tuning (PEFT) methods have gained popularity as a means to adapt PLMs effectively. In parallel, recent studies have revealed the presence of activation sparsity within the intermediate outputs of the multilayer perceptron (MLP) blocks in transformers. Low activation density enables efficient model inference on sparsity-aware hardware. Building upon this insight, in this work, we propose a novel density loss that encourages higher activation sparsity (equivalently, lower activation density) in the pre-trained models. We demonstrate the effectiveness of our approach by utilizing mainstream PEFT techniques, including QLoRA, LoRA, Adapter, and Prompt/Prefix Tuning, to facilitate efficient model adaptation across diverse downstream tasks. Experiments show that our proposed method, DEFT (Density-Efficient Fine-Tuning), can consistently reduce activation density by up to 44.94% on RoBERTa (Large) and by 53.19 (encoder density) and 90.60% (decoder density) on Flan-T5-XXL (11B) compared to PEFT, using GLUE and QA (SQuAD) benchmarks respectively while maintaining competitive performance on downstream tasks. We also introduce ADA-DEFT, an adaptive variant of our DEFT approach, which achieves significant memory and runtime savings during inference for large models. For instance, ADA-DEFT reduces runtime by 8.75% and memory usage by 16.78% in Flan-T5-XL and by 2.79% and 2.54%, respectively, in Flan-T5- XXL. Additionally, we showcase that DEFT works complementarily with quantized and pruned models.

TMLR Journal 2025 Journal Article

Group Fair Federated Learning via Stochastic Kernel Regularization

  • Huzaifa Arif
  • Pin-Yu Chen
  • Keerthiram Murugesan
  • Alex Gittens

Ensuring \textbf{group fairness} in federated learning (FL) presents unique challenges due to data heterogeneity and communication constraints. We propose Kernel Fair Federated Learning (\texttt{KFFL}), a novel framework that incorporates group fairness into FL models using the Kernel Hilbert-Schmidt Independence Criterion (KHSIC) as a fairness regularizer. To address scalability, \texttt{KFFL} approximates KHSIC with Random Feature Maps (RFMs), significantly reducing computational and communication overhead while achieving \textit{group fairness}. To address the resulting non-convex optimization problem, we propose \texttt{FedProxGrad}, a federated proximal gradient algorithm that guarantees convergence. Through experiments on standard benchmark datasets across both IID and Non-IID settings for regression and classification tasks, \texttt{KFFL} demonstrates its ability to balance accuracy and fairness effectively, outperforming existing methods by comprehensively exploring the Pareto Frontier. Furthermore, we introduce \texttt{KFFL-TD}, a time-delayed variant that further reduces communication rounds, enhancing efficiency in decentralized environments.

ICLR Conference 2025 Conference Paper

Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge

  • Jiayi Ye
  • Yanbo Wang 0005
  • Yue Huang 0001
  • Dongping Chen
  • Qihui Zhang
  • Nuno Moniz
  • Tian Gao
  • Werner Geyer

LLM-as-a-Judge has been widely utilized as an evaluation method in various benchmarks and served as supervised rewards in model training. However, despite their excellence in many domains, potential issues are under-explored, undermining their reliability and the scope of their utility. Therefore, we identify 12 key potential biases and propose a new automated bias quantification framework—CALM—which systematically quantifies and analyzes each type of bias in LLM-as-a-Judge by using automated and principle-guided modification. Our experiments cover multiple popular language models, and the results indicate that while advanced models have achieved commendable overall performance, significant biases persist in certain specific tasks. Empirical results suggest that there remains room for improvement in the reliability of LLM-as-a-Judge. Moreover, we also discuss the explicit and implicit influence of these biases and give some suggestions for the reliable application of LLM-as-a-Judge. Our work highlights the need for stakeholders to address these issues and remind users to exercise caution in LLM-as-a-Judge applications.

TMLR Journal 2025 Journal Article

Language Models Are Good Tabular Learners

  • Zhenhan Huang
  • Kavitha Srinivas
  • Horst Samulowitz
  • Niharika S. D'Souza
  • Charu C. Aggarwal
  • Pin-Yu Chen
  • Jianxi Gao

Transformer-based language models have become the de facto standard in natural language processing. However, they underperform in the tabular data domain compared to traditional tree-based methods. We posit that current models fail to achieve the full potential of language models due to (i) heterogeneity of tabular data; and (ii) challenges faced by the model in interpreting numerical values. Based on this hypothesis, we propose the Tabular Domain Transformer (TDTransformer) framework. TDTransformer has distinct embedding processes for different types of columns. The alignment layers for different column-types transform these embeddings to a common space. Besides, TDTransformer adapts piece-wise linear encoding for numerical values for better performance. We test the proposed method on 76 real-world tabular classification datasets from the OpenML benchmark. Extensive experiments indicate that TDTransformer significantly improves the state-of-the-art methods.

ICLR Conference 2025 Conference Paper

Large Language Models can Become Strong Self-Detoxifiers

  • Ching-Yun Ko
  • Pin-Yu Chen
  • Payel Das
  • Youssef Mroueh
  • Soham Dan
  • Georgios Kollias
  • Subhajit Chaudhury
  • Tejaswini Pedapati

Reducing the likelihood of generating harmful and toxic output is an essential task when aligning large language models (LLMs). Existing methods mainly rely on training an external reward model (i.e., another language model) or fine-tuning the LLM using self-generated data to influence the outcome. In this paper, we show that LLMs have the capability of self-detoxification without external reward model learning or retraining of the LM. We propose \textit{Self-disciplined Autoregressive Sampling (SASA)}, a lightweight controlled decoding algorithm for toxicity reduction of LLMs. SASA leverages the contextual representations from an LLM to learn linear subspaces from labeled data characterizing toxic v.s. non-toxic output in analytical forms. When auto-completing a response token-by-token, SASA dynamically tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy. Evaluated on LLMs of different scale and nature, namely Llama-3.1-Instruct (8B), Llama-2 (7B), and GPT2-L models with the RealToxicityPrompts, BOLD, and AttaQ benchmarks, SASA markedly enhances the quality of the generated sentences relative to the original models and attains comparable performance to state-of-the-art detoxification techniques, significantly reducing the toxicity level by only using the LLM's internal representations.

ICLR Conference 2025 Conference Paper

REFINE: Inversion-Free Backdoor Defense via Model Reprogramming

  • Yukun Chen 0002
  • Shuo Shao 0002
  • Enhao Huang
  • Yiming Li 0004
  • Pin-Yu Chen
  • Zhan Qin
  • Kui Ren 0001

Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat, allowing adversaries to implant hidden malicious behaviors during the model training phase. Pre-processing-based defense, which is one of the most important defense paradigms, typically focuses on input transformations or backdoor trigger inversion (BTI) to deactivate or eliminate embedded backdoor triggers during the inference process. However, these methods suffer from inherent limitations: transformation-based defenses often fail to balance model utility and defense performance, while BTI-based defenses struggle to accurately reconstruct trigger patterns without prior knowledge. In this paper, we propose REFINE, an inversion-free backdoor defense method based on model reprogramming. REFINE consists of two key components: \textbf{(1)} an input transformation module that disrupts both benign and backdoor patterns, generating new benign features; and \textbf{(2)} an output remapping module that redefines the model's output domain to guide the input transformations effectively. By further integrating supervised contrastive loss, REFINE enhances the defense capabilities while maintaining model utility. Extensive experiments on various benchmark datasets demonstrate the effectiveness of our REFINE and its resistance to potential adaptive attacks.

AAAI Conference 2025 Conference Paper

Retention Score: Quantifying Jailbreak Risks for Vision Language Models

  • Zaitang LI
  • Pin-Yu Chen
  • Tsung-Yi Ho

The emergence of Vision-Language Models (VLMs) is significant advancement in integrating computer vision with Large Language Models (LLMs) to enhance multi-modal machine learning capabilities. However, this progress has made VLMs vulnerable to advanced adversarial attacks, raising concerns about reliability. Objective of this paper is to assess resilience of VLMs against jailbreak attacks that can compromise model safety compliance and result in harmful outputs. To evaluate VLM's ability to maintain robustness against adversarial input perturbations, we propose novel metric called \textbf{Retention Score}. Retention Score is multi-modal evaluation metric that includes Retention-I and Retention-T scores for quantifying jailbreak risks in visual and textual components of VLMs. Our process involves generating synthetic image-text pairs using conditional diffusion model. These pairs are then predicted for toxicity score by VLM alongside toxicity judgment classifier. By calculating margin in toxicity scores, we can quantify robustness of VLM in attack-agnostic manner. Our work has four main contributions. First, we prove that Retention Score can serve as certified robustness metric. Second, we demonstrate that most VLMs with visual components are less robust against jailbreak attacks than corresponding plain VLMs. Additionally, we evaluate black-box VLM APIs and find that security settings in Google Gemini significantly affect score and robustness. Moreover, robustness of GPT4V is similar to medium settings of Gemini. Finally, our approach offers time-efficient alternative to existing adversarial attack methods and provides consistent model robustness rankings when evaluated on VLMs including MiniGPT-4, InstructBLIP, and LLaVA.

ICLR Conference 2025 Conference Paper

Revisiting Mode Connectivity in Neural Networks with Bezier Surface

  • Jie Ren
  • Pin-Yu Chen
  • Ren Wang 0008

Understanding the loss landscapes of neural networks (NNs) is critical for optimizing model performance. Previous research has identified the phenomenon of mode connectivity on curves, where two well-trained NNs can be connected by a continuous path in parameter space where the path maintains nearly constant loss. In this work, we extend the concept of mode connectivity to explore connectivity on surfaces, significantly broadening its applicability and unlocking new opportunities. While initial attempts to connect models via linear surfaces in parameter space were unsuccessful, we propose a novel optimization technique that consistently discovers Bézier surfaces with low-loss and high-accuracy connecting multiple NNs in a nonlinear manner. We further demonstrate that even without optimization, mode connectivity exists in certain cases of Bézier surfaces, where the models are carefully selected and combined linearly. This approach provides a deeper and more comprehensive understanding of the loss landscape and offers a novel way to identify models with enhanced performance for model averaging and output ensembling. We demonstrate the effectiveness of our method on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets using VGG16, ResNet18, and ViT architectures.

ICLR Conference 2025 Conference Paper

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

  • Han Shen
  • Pin-Yu Chen
  • Payel Das
  • Tianyi Chen

Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, though fine-tuning enhances the model performance for specialized applications, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5\% and 9.7\% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github https://github.com/hanshen95/SEAL.

NeurIPS Conference 2025 Conference Paper

Shape it Up! Restoring LLM Safety during Finetuning

  • ShengYun Peng
  • Pin-Yu Chen
  • Jianfeng Chi
  • Seongmin Lee
  • Duen Horng Chau

Finetuning large language models (LLMs) enables user-specific customization but introduces important safety risks: even a few harmful examples can compromise safety alignment. A common mitigation strategy is to update the model more strongly on examples deemed safe, while downweighting or excluding those flagged as unsafe. However, because safety context can shift within a single example, updating the model equally on both harmful and harmless parts of a response is suboptimal — an atomic treatment we term static safety shaping. In contrast, we propose dynamic safety shaping (DSS), a dynamic shaping framework that uses fine-grained safety signals to reinforce learning from safe segments of a response while suppressing unsafe content. To enable such fine-grained control during finetuning, we introduce a key insight: guardrail models, traditionally used for filtering, can be repurposed to evaluate partial responses, tracking how safety risk evolves throughout the response, segment by segment. This leads to the Safety Trajectory Assessment of Response (STAR), a token-level signal that enables shaping to operate dynamically over the training sequence. Building on this, we present ★DSS, a DSS method guided by STAR scores that robustly mitigates finetuning risks and delivers substantial safety improvements across diverse threats, datasets, and model families, all without compromising capability on intended tasks. We encourage future safety research to build on dynamic shaping principles for stronger mitigation against evolving finetuning risks. Our code is publicly available at https: //github. com/poloclub/star-dss

IJCAI Conference 2025 Conference Paper

SPARC: An AI-Based Speech Processing and Real-Time Correction System

  • TingRay Chung
  • Pin-Yu Chen

In the world of audio narration and video production, maintaining clear and accurate dialogue is crucial. However, most work done in dubbing mistakes is done in post-production which is often not applicable to live broadcasts. This project aims to develop a real-time voice correction system that automatically detects and corrects speech errors in near real-time while integrating the adjusted audio into ongoing conversations without disrupting the natural flow. This paper utilizes various AI tools like the Nous Hermes 2-Mistral 7B DPO large language model to first generate the reference script for Coqui's XTTS-V2 zero-shot text-to-speech voice cloning model. After the correction is generated, it goes through a series of filters to replace the mistake and seamlessly integrates it. The experiment's user survey demonstrates that the corrected audio is of high quality.

ICLR Conference 2025 Conference Paper

TabWak: A Watermark for Tabular Diffusion Models

  • Chaoyi Zhu
  • Jiayi Tang
  • Jeroen M. Galjaard
  • Pin-Yu Chen
  • Robert Birke
  • Cornelis Bos
  • Lydia Y. Chen

Synthetic data offers alternatives for data augmentation and sharing. Till date, it remains unknown how to use watermarking techniques to trace and audit synthetic tables generated by tabular diffusion models to mitigate potential misuses. In this paper, we design TabWak, the first watermarking method to embed invisible signatures that control the sampling of Gaussian latent codes used to synthesize table rows via the diffusion backbone. TabWak has two key features. Different from existing image watermarking techniques, TabWak uses self-cloning and shuffling to embed the secret key in positional information of random seeds that control the Gaussian latents, allowing to use different seeds at each row for high inter-row diversity and enabling row-wise detectability. To further boost the robustness of watermark detection against post-editing attacks, TabWak uses a valid-bit mechanism that focuses on the tail of the latent code distribution for superior noise resilience. We provide theoretical guarantees on the row diversity and effectiveness of detectability. We evaluate TabWak on five datasets against baselines to show that the quality of watermarked tables remains nearly indistinguishable from non-watermarked tables while achieving high detectability in the presence of strong post-editing attacks, with a 100% true positive rate at a 0.1% false positive rate on synthetic tables with fewer than 300 rows. Our code is available at the following anonymized repository https://github.com/chaoyitud/TabWak.

AAAI Conference 2025 Conference Paper

Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models

  • Xiaomeng Hu
  • Pin-Yu Chen
  • Tsung-Yi Ho

Large Language Models (LLMs) are increasingly being integrated into services such as ChatGPT to provide responses to user queries. To mitigate potential harm and prevent misuse, there have been concerted efforts to align the LLMs with human values and legal compliance by incorporating various techniques, such as Reinforcement Learning from Human Feedback (RLHF), into the training of the LLMs. However, recent research has exposed that even aligned LLMs are susceptible to adversarial manipulations known as Jailbreak Attacks. To address this challenge, this paper proposes a method called Token Highlighter to inspect and mitigate the potential jailbreak threats in the user query. Token Highlighter introduced a concept called Affirmation Loss to measure the LLM's willingness to answer the user query. It then uses the gradient of Affirmation Loss for each token in the user query to locate the jailbreak-critical tokens. Further, Token Highlighter exploits our proposed Soft Removal technique to mitigate the jailbreak effects of critical tokens via shrinking their token embeddings. Experimental results on two aligned LLMs (LLaMA-2 and Vicuna-V1.5) demonstrate that the proposed method can effectively defend against a variety of Jailbreak Attacks while maintaining competent performance on benign questions of the AlpacaEval benchmark. In addition, Token Highlighter is a cost-effective and interpretable defense because it only needs to query the protected LLM once to compute the Affirmation Loss and can highlight the critical tokens upon refusal.

ICLR Conference 2025 Conference Paper

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

  • Hongkang Li
  • Songtao Lu
  • Pin-Yu Chen
  • Xiaodong Cui
  • Meng Wang

Chain-of-Thought (CoT) is an efficient prompting method that enables the reasoning ability of large language models by augmenting the query using multiple examples with multiple intermediate steps. Despite the empirical success, the theoretical understanding of how to train a Transformer to achieve the CoT ability remains less explored. This is primarily due to the technical challenges involved in analyzing the nonconvex optimization on nonlinear attention models. To the best of our knowledge, this work provides the first theoretical study of training Transformers with nonlinear attention to obtain the CoT generalization capability so that the resulting model can inference on unseen tasks when the input is augmented by examples of the new task. We first quantify the required training samples and iterations to train a Transformer model towards CoT ability. We then prove the success of its CoT generalization on unseen tasks with distribution-shifted testing data. Moreover, we theoretically characterize the conditions for an accurate reasoning output by CoT even when the provided reasoning examples contain noises and are not always accurate. In contrast, in-context learning (ICL), which can be viewed as one-step CoT without intermediate steps, may fail to provide an accurate output when CoT does. These theoretical findings are justified through experiments.

ICLR Conference 2025 Conference Paper

When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers

  • Hongkang Li
  • Yihua Zhang
  • Shuai Zhang 0015
  • Pin-Yu Chen
  • Sijia Liu 0001
  • Meng Wang 0003

Task arithmetic refers to editing the pre-trained model by adding a weighted sum of task vectors, each of which is the weight update from the pre-trained model to fine-tuned models for certain tasks. This approach recently gained attention as a computationally efficient inference method for model editing, e.g., multi-task learning, forgetting, and out-of-domain generalization capabilities. However, the theoretical understanding of why task vectors can execute various conceptual operations remains limited, due to the highly non-convexity of training Transformer-based models. To the best of our knowledge, this paper provides the first theoretical characterization of the generalization guarantees of task vector methods on nonlinear Transformers. We consider a conceptual learning setting, where each task is a binary classification problem based on a discriminative pattern. We theoretically prove the effectiveness of task addition in simultaneously learning a set of irrelevant or aligned tasks, as well as the success of task negation in unlearning one task from irrelevant or contradictory tasks. Moreover, we prove the proper selection of linear coefficients for task arithmetic to achieve guaranteed generalization to out-of-domain tasks. All of our theoretical results hold for both dense-weight parameters and their low-rank approximations. Although established in a conceptual setting, our theoretical findings were validated on a practical machine unlearning task using the large language model Phi-1.5 (1.3B).

ICML Conference 2024 Conference Paper

A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts

  • Mohammed Nowaz Rabbani Chowdhury
  • Meng Wang 0003
  • Kaoutar El Maghraoui
  • Naigang Wang
  • Pin-Yu Chen
  • Christopher D. Carothers

The sparsely gated mixture of experts (MoE) architecture sends different inputs to different subnetworks (experts), through trainable routers. MoE reduces the training computation significantly for large models, but its deployment can be still memory/computation expensive for some downstream tasks. Model pruning is a popular approach to reduce inference computation, but its application in MoE architecture is largely unexplored. To the best of our knowledge, this paper provides the first provably efficient technique for pruning experts in fine-tuned MoE models. We theoretically prove that prioritizing the pruning of the experts with a smaller change of the router’s $l_2$ norm from the pre-trained model guarantees the preservation of test accuracy, while significantly reducing the model size and the computational requirements. Although our theoretical analysis is centered on binary classification tasks on simplified MoE architecture, our expert pruning method is verified on large vision MoE models such as V-MoE and $\text{E}^3$-MoE fine-tuned on benchmark datasets such as CIFAR-10, CIFAR-100, and ImageNet.

ICLR Conference 2024 Conference Paper

AutoVP: An Automated Visual Prompting Framework and Benchmark

  • Hsi-Ai Tsao
  • Lei Hsiung
  • Pin-Yu Chen
  • Si Liu 0001
  • Tsung-Yi Ho

Visual prompting (VP) is an emerging parameter-efficient fine-tuning approach to adapting pre-trained vision models to solve various downstream image-classification tasks. However, there has hitherto been little systematic study of the design space of VP and no clear benchmark for evaluating its performance. To bridge this gap, we propose AutoVP, an end-to-end expandable framework for automating VP design choices, along with 12 downstream image-classification tasks that can serve as a holistic VP-performance benchmark. Our design space covers 1) the joint optimization of the prompts; 2) the selection of pre-trained models, including image classifiers and text-image encoders; and 3) model output mapping strategies, including nonparametric and trainable label mapping. Our extensive experimental results show that AutoVP outperforms the best-known current VP methods by a substantial margin, having up to 6.7% improvement in accuracy; and attains a maximum performance increase of 27.5% compared to linear-probing (LP) baseline. AutoVP thus makes a two-fold contribution: serving both as an efficient tool for hyperparameter tuning on VP design choices, and as a comprehensive benchmark that can reasonably be expected to accelerate VP’s development. The source code is available at https://github.com/IBM/AutoVP.

ICML Conference 2024 Conference Paper

Be Your Own Neighborhood: Detecting Adversarial Examples by the Neighborhood Relations Built on Self-Supervised Learning

  • Zhiyuan He
  • Yijun Yang
  • Pin-Yu Chen
  • Qiang Xu 0001
  • Tsung-Yi Ho

Deep Neural Networks (DNNs) are vulnerable to Adversarial Examples (AEs), hindering their use in safety-critical systems. In this paper, we present BEYOND, an innovative AE detection framework designed for reliable predictions. BEYOND identifies AEs by distinguishing the AE’s abnormal relation with its augmented versions, i. e. neighbors, from two prospects: representation similarity and label consistency. An off-the-shelf Self-Supervised Learning (SSL) model is used to extract the representation and predict the label for its highly informative representation capacity compared to supervised learning models. We found clean samples maintain a high degree of representation similarity and label consistency relative to their neighbors, in contrast to AEs which exhibit significant discrepancies. We explain this observation and show that leveraging this discrepancy BEYOND can accurately detect AEs. Additionally, we develop a rigorous justification for the effectiveness of BEYOND. Furthermore, as a plug-and-play model, BEYOND can easily cooperate with the Adversarial Trained Classifier (ATC), achieving state-of-the-art (SOTA) robustness accuracy. Experimental results show that BEYOND outperforms baselines by a large margin, especially under adaptive attacks. Empowered by the robust relationship built on SSL, we found that BEYOND outperforms baselines in terms of both detection ability and speed. Project page: https: //huggingface. co/spaces/allenhzy/Be-Your-Own-Neighborhood.

IJCAI Conference 2024 Conference Paper

Computational Complexity of Verifying the Group No-show Paradox

  • Farhad Mohsin
  • Qishen Han
  • Sikai Ruan
  • Pin-Yu Chen
  • Francesca Rossi
  • Lirong Xia

The (group) no-show paradox refers to the undesirable situation where a group of agents have incentive to abstain from voting to make the winner more favorable to them. To understand whether it is a critical concern in practice, in this paper, we take a computational approach by examining the computational complexity of verifying whether the group no-show paradox exists given agents' preferences and the voting rule. We prove that, unfortunately, the verification problem is NP-hard to compute for some commonly studied voting rules, i. e. , Copeland, maximin, single transferable vote, and all Condorcetified positional scoring rules such as Black's rule. We propose integer linear programming-based algorithms and a search-based algorithm for the verification problem for different voting rules. Experimental results on synthetic data illustrate that the former is efficient when the number of unique rankings in a profile is not too high, and the latter is efficient for a small number of agents. With the help of these algorithms, we observe that group no-show paradoxes rarely occur in real-world data.

AAAI Conference 2024 Conference Paper

Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift

  • Shengwei An
  • Sheng-Yen Chou
  • Kaiyuan Zhang
  • Qiuling Xu
  • Guanhong Tao
  • Guangyu Shen
  • Siyuan Cheng
  • Shiqing Ma

Diffusion models (DM) have become state-of-the-art generative models because of their capability of generating high-quality images from noises without adversarial training. However, they are vulnerable to backdoor attacks as reported by recent studies. When a data input (e.g., some Gaussian noise) is stamped with a trigger (e.g., a white patch), the backdoored model always generates the target image (e.g., an improper photo). However, effective defense strategies to mitigate backdoors from DMs are underexplored. To bridge this gap, we propose the first backdoor detection and removal framework for DMs. We evaluate our framework Elijah on over hundreds of DMs of 3 types including DDPM, NCSN and LDM, with 13 samplers against 3 existing backdoor attacks. Extensive experiments show that our approach can have close to 100% detection accuracy and reduce the backdoor effects to close to zero without significantly sacrificing the model utility.

ICLR Conference 2024 Conference Paper

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

  • Xiangyu Qi
  • Yi Zeng 0005
  • Tinghao Xie
  • Pin-Yu Chen
  • Ruoxi Jia 0001
  • Prateek Mittal
  • Peter Henderson 0002

Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open-source release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on customized datasets accelerate this trend. But, what are the safety costs associated with such customized fine-tuning? While existing safety alignment techniques restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing --- even if a model's initial safety alignment is impeccable, how can it be maintained after customized fine-tuning? We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the customized fine-tuning of aligned LLMs. (This paper contains red-teaming data and model-generated content that can be offensive in nature.)

NeurIPS Conference 2024 Conference Paper

Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes

  • Xiaomeng Hu
  • Pin-Yu Chen
  • Tsung-Yi Ho

Large Language Models (LLMs) are becoming a prominent generative AI tool, where the user enters a query and the LLM generates an answer. To reduce harm and misuse, efforts have been made to align these LLMs to human values using advanced training techniques such as Reinforcement Learning from Human Feedback (RLHF). However, recent studies have highlighted the vulnerability of LLMs to adversarial jailbreak attempts aiming at subverting the embedded safety guardrails. To address this challenge, this paper defines and investigates the Refusal Loss of LLMs and then proposes a method called Gradient Cuff to detect jailbreak attempts. Gradient Cuff exploits the unique properties observed in the refusal loss landscape, including functional values and its smoothness, to design an effective two-step detection strategy. Experimental results on two aligned LLMs (LLaMA-2-7B-Chat and Vicuna-7B-V1. 5) and six types of jailbreak attacks (GCG, AutoDAN, PAIR, TAP, Base64, and LRL) show that Gradient Cuff can significantly improve the LLM's rejection capability for malicious jailbreak queries, while maintaining the model's performance for benign user queries by adjusting the detection threshold.

NeurIPS Conference 2024 Conference Paper

GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models

  • Zaitang LI
  • Pin-Yu Chen
  • Tsung-Yi Ho

Current studies on adversarial robustness mainly focus on aggregating \textit{local} robustness results from a set of data samples to evaluate and rank different models. However, the local statistics may not well represent the true \textit{global} robustness of the underlying unknown data distribution. To address this challenge, this paper makes the first attempt to present a new framework, called \textit{GREAT Score}, for global robustness evaluation of adversarial perturbation using generative models. Formally, GREAT Score carries the physical meaning of a global statistic capturing a mean certified attack-proof perturbation level over all samples drawn from a generative model. For finite-sample evaluation, we also derive a probabilistic guarantee on the sample complexity and the difference between the sample mean and the true mean. GREAT Score has several advantages: (1) Robustness evaluations using GREAT Score are efficient and scalable to large models, by sparing the need of running adversarial attacks. In particular, we show high correlation and significantly reduced computation cost of GREAT Score when compared to the attack-based model ranking on RobustBench \cite{croce2021robustbench}. (2) The use of generative models facilitates the approximation of the unknown data distribution. In our ablation study with different generative adversarial networks (GANs), we observe consistency between global robustness evaluation and the quality of GANs. (3) GREAT Score can be used for remote auditing of privacy-sensitive black-box models, as demonstrated by our robustness evaluation on several online facial recognition services.

ICML Conference 2024 Conference Paper

How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?

  • Hongkang Li
  • Meng Wang 0003
  • Songtao Lu
  • Xiaodong Cui
  • Pin-Yu Chen

Transformer-based large language models have displayed impressive in-context learning capabilities, where a pre-trained model can handle new tasks without fine-tuning by simply augmenting the query with some input-output examples from that task. Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers. To the best of our knowledge, this paper provides the first theoretical analysis of the training dynamics of Transformers with nonlinear self-attention and nonlinear MLP, together with the ICL generalization capability of the resulting model. Focusing on a group of binary classification tasks, we train Transformers using data from a subset of these tasks and quantify the impact of various factors on the ICL generalization performance on the remaining unseen tasks with and without data distribution shifts. We also analyze how different components in the learned Transformers contribute to the ICL performance. Furthermore, we provide the first theoretical analysis of how model pruning affects ICL performance and prove that proper magnitude-based pruning can have a minimal impact on ICL while reducing inference costs. These theoretical findings are justified through numerical experiments.

ICLR Conference 2024 Conference Paper

It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition

  • Chen Chen 0075
  • Ruizhe Li 0001
  • Yuchen Hu
  • Sabato Marco Siniscalchi
  • Pin-Yu Chen
  • Engsiong Chng
  • Chao-Han Huck Yang

Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct mapping from the N-best hypotheses list generated by an ASR system to the predicted output transcription. However, despite its effectiveness, GER introduces extra data uncertainty since the LLM is trained without taking into account acoustic information available in the speech signal. In this work, we aim to overcome such a limitation by infusing acoustic information before generating the predicted transcription through a novel late fusion solution termed Uncertainty-Aware Dynamic Fusion (UADF). UADF is a multimodal fusion approach implemented into an auto-regressive decoding process and works in two stages: (i) It first analyzes and calibrates the token-level LLM decision, and (ii) it then dynamically assimilates the information from the acoustic modality. Experimental evidence collected from various ASR tasks shows that UADF surpasses existing fusion mechanisms in several ways. It yields significant improvements in word error rate (WER) while mitigating data uncertainty issues in LLM and addressing the poor generalization relied with sole modality during fusion. We also demonstrate that UADF seamlessly adapts to audio-visual speech recognition.

ICLR Conference 2024 Conference Paper

Large Language Models are Efficient Learners of Noise-Robust Speech Recognition

  • Yuchen Hu
  • Chen Chen 0075
  • Chao-Han Huck Yang
  • Ruizhe Li 0001
  • Chao Zhang 0031
  • Pin-Yu Chen
  • Engsiong Chng

Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition results. The latest work proposes a GER benchmark with "HyPoradise" dataset to learn the mapping from ASR N-best hypotheses to ground-truth transcription by efficient LLM finetuning, which shows great effectiveness but lacks specificity on noise-robust ASR. In this work, we extend the benchmark to noisy conditions and investigate if we can teach LLMs to perform denoising for GER just like what robust ASR do, where one solution is introducing noise information as a conditioner into LLM. However, directly incorporating noise embeddings from audio encoder could harm the LLM tuning due to cross-modality gap. To this end, we propose to extract a language-space noise embedding from the N-best list to represent the noise conditions of source speech, which can promote the denoising process in GER. Furthermore, in order to enhance its representation ability of audio noise, we design a knowledge distillation (KD) approach via mutual information estimation to distill the real noise information in audio embeddings to our language embedding. Experiments on various latest LLMs demonstrate our approach achieves a new breakthrough with up to 53.9% correction improvement in terms of word error rate while with limited training data. Analysis shows that our language-space noise embedding can well represent the noise conditions of source speech, under which off-the-shelf LLMs show strong ability of language-space denoising.

ICML Conference 2024 Conference Paper

Larimar: Large Language Models with Episodic Memory Control

  • Payel Das
  • Subhajit Chaudhury
  • Elliot Nelson
  • Igor Melnyk
  • Sarathkrishna Swaminathan
  • Sihui Dai
  • Aurélie C. Lozano
  • Georgios Kollias

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar’s memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Larimar attains accuracy comparable to most competitive baselines, even in the challenging sequential editing setup, but also excels in speed—yielding speed-ups of 8-10x depending on the base LLM —as well as flexibility due to the proposed architecture being simple, LLM-agnostic, and hence general. We further provide mechanisms for selective fact forgetting, information leakage prevention, and input context length generalization with Larimar and show their effectiveness. Our code is available at https: //github. com/IBM/larimar.

ICML Conference 2024 Conference Paper

Learning Optimal Projection for Forecast Reconciliation of Hierarchical Time Series

  • Asterios Tsiourvas
  • Wei Sun 0031
  • Georgia Perakis
  • Pin-Yu Chen
  • Yada Zhu

Hierarchical time series forecasting requires not only prediction accuracy but also coherency, i. e. , forecasts add up appropriately across the hierarchy. Recent literature has shown that reconciliation via projection outperforms prior methods such as top-down or bottom-up approaches. Unlike existing work that pre-specifies a projection matrix (e. g. , orthogonal), we study the problem of learning the optimal oblique projection from data for coherent forecasting of hierarchical time series. In addition to the unbiasedness-preserving property, oblique projection implicitly accounts for the hierarchy structure and assigns different weights to individual time series, providing significant adaptability over orthogonal projection which treats base forecast errors equally. We examine two broad classes of projections, namely Euclidean projection and general oblique projections. We propose to model the reconciliation step as a learnable, structured, projection layer in the neural forecaster architecture. The proposed approach allows for the efficient learning of the optimal projection in an end-to-end framework where both the neural forecaster and the projection layer are learned simultaneously. An empirical evaluation of real-world hierarchical time series datasets demonstrates the superior performance of the proposed method over existing state-of-the-art approaches.

AAAI Conference 2024 Conference Paper

Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning

  • Pin-Yu Chen

In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper provides an overview of model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation of the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities.

NeurIPS Conference 2024 Conference Paper

Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models

  • Sheng Y. Peng
  • Pin-Yu Chen
  • Matthew Hull
  • Duen H. Chau

Safety alignment is crucial to ensure that large language models (LLMs) behave in ways that align with human preferences and prevent harmful actions during inference. However, recent studies show that the alignment can be easily compromised through finetuning with only a few adversarially designed training examples. We aim to measure the risks in finetuning LLMs through navigating the LLM safety landscape. We discover a new phenomenon observed universally in the model parameter space of popular open-source LLMs, termed as “safety basin”: random perturbations to model weights maintain the safety level of the original aligned model within its local neighborhood. However, outside this local region, safety is fully compromised, exhibiting a sharp, step-like drop. This safety basin contrasts sharply with the LLM capability landscape, where model performance peaks at the origin and gradually declines as random perturbation increases. Our discovery inspires us to propose the new VISAGE safety metric that measures the safety in LLM finetuning by probing its safety landscape. Visualizing the safety landscape of the aligned model enables us to understand how finetuning compromises safety by dragging the model away from the safety basin. The LLM safety landscape also highlights the system prompt’s critical role in protecting a model, and that such protection transfers to its perturbed variants within the safety basin. These observations from our safety landscape research provide newinsights for future work on LLM safety community. Our code is publicly available at https: //github. com/ShengYun-Peng/llm-landscape.

TMLR Journal 2024 Journal Article

Neural Clamping: Joint Input Perturbation and Temperature Scaling for Neural Network Calibration

  • Yung-Chen Tang
  • Pin-Yu Chen
  • Tsung-Yi Ho

Neural network calibration is an essential task in deep learning to ensure consistency between the confidence of model prediction and the true correctness likelihood. In this paper, we propose a new post-processing calibration method called $\textbf{Neural Clamping}$, which employs a simple joint input-output transformation on a pre-trained classifier via a learnable universal input perturbation and an output temperature scaling parameter. Moreover, we provide theoretical explanations on why Neural Clamping is provably better than temperature scaling. Evaluated on BloodMNIST, CIFAR-100, and ImageNet image recognition datasets and a variety of deep neural network models, our empirical results show that Neural Clamping significantly outperforms state-of-the-art post-processing calibration methods. The code is available at github.com/yungchentang/NCToolkit, and the demo is available at huggingface.co/spaces/TrustSafeAI/NCTV.

NeurIPS Conference 2024 Conference Paper

NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes

  • Hao-Lun Sun
  • Lei Hsiung
  • Nandhini Chandramoorthy
  • Pin-Yu Chen
  • Tsung-Yi Ho

Deep neural networks (DNNs) have become ubiquitous in machine learning, but their energy consumption remains problematically high. An effective strategy for reducing such consumption is supply-voltage reduction, but if done too aggressively, it can lead to accuracy degradation. This is due to random bit-flips in static random access memory (SRAM), where model parameters are stored. To address this challenge, we have developed NeuralFuse, a novel add-on module that handles the energy-accuracy tradeoff in low-voltage regimes by learning input transformations and using them to generate error-resistant data representations, thereby protecting DNN accuracy in both nominal and low-voltage scenarios. As well as being easy to implement, NeuralFuse can be readily applied to DNNs with limited access, such cloud-based APIs that are accessed remotely or non-configurable hardware. Our experimental results demonstrate that, at a 1% bit-error rate, NeuralFuse can reduce SRAM access energy by up to 24% while recovering accuracy by up to 57%. To the best of our knowledge, this is the first approach to addressing low-voltage-induced bit errors that requires no model retraining.

ICML Conference 2024 Conference Paper

Position: TrustLLM: Trustworthiness in Large Language Models

  • Yue Huang 0001
  • Lichao Sun 0001
  • Haoran Wang 0005
  • Siyuan Wu 0001
  • Qihui Zhang
  • Yuan Li
  • Chujie Gao
  • Yixin Huang

Large language models (LLMs) have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and capability (i. e. , functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones, suggesting that open-source models can achieve high levels of trustworthiness without additional mechanisms like moderator, offering valuable insights for developers in this field. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Besides these observations, we’ve uncovered key insights into the multifaceted trustworthiness in LLMs. We emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. We advocate that the establishment of an AI alliance between industry, academia, the open-source community to foster collaboration is imperative to advance the trustworthiness of LLMs.

ICML Conference 2024 Conference Paper

Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts

  • Zhi-Yi Chin
  • Chieh-Ming Jiang
  • Ching-Chun Huang
  • Pin-Yu Chen
  • Wei-Chen Chiu

Text-to-image diffusion models, e. g. Stable Diffusion (SD), lately have shown remarkable ability in high-quality content generation, and become one of the representatives for the recent wave of transformative AI. Nevertheless, such advance comes with an intensifying concern about the misuse of this generative technology, especially for producing copyrighted or NSFW (i. e. not safe for work) images. Although efforts have been made to filter inappropriate images/prompts or remove undesirable concepts/styles via model fine-tuning, the reliability of these safety mechanisms against diversified problematic prompts remains largely unexplored. In this work, we propose Prompting4Debugging (P4D) as a debugging and red-teaming tool that automatically finds problematic prompts for diffusion models to test the reliability of a deployed safety mechanism. We demonstrate the efficacy of our P4D tool in uncovering new vulnerabilities of SD models with safety mechanisms. Particularly, our result shows that around half of prompts in existing safe prompting benchmarks which were originally considered "safe" can actually be manipulated to bypass many deployed safety mechanisms, including concept removal, negative prompt, and safety guidance. Our findings suggest that, without comprehensive testing, the evaluations on limited safe prompting benchmarks can lead to a false sense of safety for text-to-image models.

ICLR Conference 2024 Conference Paper

Rethinking Backdoor Attacks on Dataset Distillation: A Kernel Method Perspective

  • Ming-Yu Chung
  • Sheng-Yen Chou
  • Chia-Mu Yu
  • Pin-Yu Chen
  • Sy-Yen Kuo
  • Tsung-Yi Ho

Dataset distillation offers a potential means to enhance data efficiency in deep learning. Recent studies have shown its ability to counteract backdoor risks present in original training samples. In this study, we delve into the theoretical aspects of backdoor attacks and dataset distillation based on kernel methods. We introduce two new theory-driven trigger pattern generation methods specialized for dataset distillation. Following a comprehensive set of analyses and experiments, we show that our optimization-based trigger design framework informs effective backdoor attacks on dataset distillation. Notably, datasets poisoned by our designed trigger prove resilient against conventional backdoor attack detection and mitigation methods. Our empirical results validate that the triggers developed using our approaches are proficient at executing resilient backdoor attacks.

ICML Conference 2024 Conference Paper

Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark

  • Yihua Zhang
  • Pingzhi Li
  • Junyuan Hong
  • Jiaxiang Li
  • Yimeng Zhang
  • Wenqing Zheng
  • Pin-Yu Chen
  • Jason D. Lee

In the evolving landscape of natural language processing (NLP), fine-tuning pre-trained Large Language Models (LLMs) with first-order (FO) optimizers like SGD and Adam has become standard. Yet, as LLMs grow in size, the substantial memory overhead from back-propagation (BP) for FO gradient computation presents a significant challenge. Addressing this issue is crucial, especially for applications like on-device training where memory efficiency is paramount. This paper proposes a shift towards BP-free, zeroth-order (ZO) optimization as a solution for reducing memory costs during LLM fine-tuning, building on the initial concept introduced by (Malladi et al. , 2023). Unlike traditional ZO-SGD methods, ou让work expands the exploration to a wider array of ZO optimization techniques, through a comprehensive, first-of-its-kind benchmarking study across five LLM families, three task complexities, and five fine-tuning schemes. Our study unveils previously overlooked optimization principles, highlighting the importance of task alignment, the role of the forward gradient method, and the balance between algorithm complexity and fine-tuning performance. We further introduce novel enhancements to ZO optimization, including block-wise descent, hybrid training, and gradient sparsity. Our study offers a promising direction for achieving further memory-efficient LLM fine-tuning. Codes to reproduce all our experiments will be made public.

ICLR Conference 2024 Conference Paper

Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?

  • Yu-Lin Tsai
  • Chia-Yi Hsu
  • Chulin Xie
  • Chih-Hsun Lin
  • Jia-You Chen 0001
  • Bo Li 0026
  • Pin-Yu Chen
  • Chia-Mu Yu

Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have recently demonstrated exceptional capabilities for generating high-quality content. However, this progress has raised several concerns of potential misuse, particularly in creating copyrighted, prohibited, and restricted content, or NSFW (not safe for work) images. While efforts have been made to mitigate such problems, either by implementing a safety filter at the evaluation stage or by fine-tuning models to eliminate undesirable concepts or styles, the effectiveness of these safety measures in dealing with a wide range of prompts remains largely unexplored. In this work, we aim to investigate these safety mechanisms by proposing one novel concept retrieval algorithm for evaluation. We introduce Ring-A-Bell, a model-agnostic red-teaming scheme for T2I diffusion models, where the whole evaluation can be prepared in advance without prior knowledge of the target model. Specifically, Ring-A-Bell first performs concept extraction to obtain holistic representations for sensitive and inappropriate concepts. Subsequently, by leveraging the extracted concept, Ring-A-Bell automatically identifies problematic prompts for diffusion models with the corresponding generation of inappropriate content, allowing the user to assess the reliability of deployed safety mechanisms. Finally, we empirically validate our method by testing online services such as Midjourney and various methods of concept removal. Our results show that Ring-A-Bell, by manipulating safe prompting benchmarks, can transform prompts that were originally regarded as safe to evade existing safety mechanisms, thus revealing the defects of the so-called safety mechanisms which could practically lead to the generation of harmful contents. In essence, Ring-A-Bell could serve as a red-teaming tool to understand the limitations of deployed safety mechanisms and to explore the risk under plausible attacks. Our codes are available at https://github.com/chiayi-hsu/Ring-A-Bell.

NeurIPS Conference 2024 Conference Paper

Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models

  • Chia-Yi Hsu
  • Yu-Lin Tsai
  • Chih-Hsun Lin
  • Pin-Yu Chen
  • Chia-Mu Yu
  • Chun-Ying Huang

While large language models (LLMs) such as Llama-2 or GPT-4 have shown impressive zero-shot performance, fine-tuning is still necessary to enhance their performance for customized datasets, domain-specific tasks, or other private needs. However, fine-tuning all parameters of LLMs requires significant hardware resources, which can be impractical for typical users. Therefore, parameter-efficient fine-tuning such as LoRA have emerged, allowing users to fine-tune LLMs without the need for considerable computing resources, with little performance degradation compared to fine-tuning all parameters. Unfortunately, recent studies indicate that fine-tuning can increase the risk to the safety of LLMs, even when data does not contain malicious content. To address this challenge, we propose $\textsf{Safe LoRA}$, a simple one-liner patch to the original LoRA implementation by introducing the projection of LoRA weights from selected layers to the safety-aligned subspace, effectively reducing the safety risks in LLM fine-tuning while maintaining utility. It is worth noting that $\textsf{Safe LoRA}$ is a training-free and data-free approach, as it only requires the knowledge of the weights from the base and aligned LLMs. Our extensive experiments demonstrate that when fine-tuning on purely malicious data, $\textsf{Safe LoRA}$ retains similar safety performance as the original aligned model. Moreover, when the fine-tuning dataset contains a mixture of both benign and malicious data, $\textsf{Safe LoRA}$ mitigates the negative effect made by malicious data while preserving performance on downstream tasks. Our codes are available at https: //github. com/IBM/SafeLoRA.

NeurIPS Conference 2024 Conference Paper

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

  • Yuchen Hu
  • Chen Chen
  • Chao-Han H. Yang
  • Chengwei Qin
  • Pin-Yu Chen
  • Eng S. Chng
  • Chao Zhang

We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) systems in diverse target domains, such as noise and accents. STAR is developed for prevalent speech foundation models based on Transformer-related architecture with auto-regressive decoding (e. g. , Whisper, Canary). Specifically, we propose a novel indicator that empirically integrates step-wise information during decoding to assess the token-level quality of pseudo labels without ground truth, thereby guiding model updates for effective unsupervised adaptation. Experimental results show that STAR achieves an average of 13. 5% relative reduction in word error rate across 14 target domains, and it sometimes even approaches the upper-bound performance of supervised adaptation. Surprisingly, we also observe that STAR prevents the adapted model from the common catastrophic forgetting problem without recalling source-domain data. Furthermore, STAR exhibits high data efficiency that only requires less than one-hour unlabeled data, and seamless generality to alternative large speech models and speech translation tasks. Our code aims to open source to the research communities.

ICML Conference 2024 Conference Paper

SF-DQN: Provable Knowledge Transfer using Successor Feature for Deep Reinforcement Learning

  • Shuai Zhang 0015
  • Heshan Devaka Fernando
  • Miao Liu 0001
  • Keerthiram Murugesan
  • Songtao Lu
  • Pin-Yu Chen
  • Tianyi Chen
  • Meng Wang 0003

This paper studies the transfer reinforcement learning (RL) problem where multiple RL problems have different reward functions but share the same underlying transition dynamics. In this setting, the Q-function of each RL problem (task) can be decomposed into a successor feature (SF) and a reward mapping: the former characterizes the transition dynamics, and the latter characterizes the task-specific reward function. This Q-function decomposition, coupled with a policy improvement operator known as generalized policy improvement (GPI), reduces the sample complexity of finding the optimal Q-function, and thus the SF & GPI framework exhibits promising empirical performance compared to traditional RL methods like Q-learning. However, its theoretical foundations remain largely unestablished, especially when learning the successor features using deep neural networks (SF-DQN). This paper studies the provable knowledge transfer using SFs-DQN in transfer RL problems. We establish the first convergence analysis with provable generalization guarantees for SF-DQN with GPI. The theory reveals that SF-DQN with GPI outperforms conventional RL approaches, such as deep Q-network, in terms of both faster convergence rate and better generalization. Numerical experiments on real and synthetic RL tasks support the superior performance of SF-DQN & GPI, aligning with our theoretical findings.

ICLR Conference 2024 Conference Paper

The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Language Models

  • Yan Liu 0002
  • Yu Liu
  • Xiaokang Chen
  • Pin-Yu Chen
  • Daoguang Zan
  • Min-Yen Kan
  • Tsung-Yi Ho

Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases, which may cause negative social impacts or even bring catastrophic results in application. Previous works on this problem mainly focused on using black-box methods such as probing to detect and quantify social biases in PLMs by observing model outputs. As a result, previous debiasing methods mainly finetune or even pre-train PLMs on newly constructed anti-stereotypical datasets, which are high-cost. In this work, we try to unveil the mystery of social bias inside language models by introducing the concept of {\sc Social Bias Neurons}. Specifically, we propose {\sc Integrated Gap Gradients (IG$^2$)} to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias. By formalizing undesirable behavior as a distributional property of language, we employ sentiment-bearing prompts to elicit classes of sensitive words (demographics) correlated with such sentiments. Our IG$^2$ thus attributes the uneven distribution for different demographics to specific Social Bias Neurons, which track the trail of unwanted behavior inside PLM units to achieve interoperability. Moreover, derived from our interpretable technique, {\sc Bias Neuron Suppression (BNS)} is further proposed to mitigate social biases. By studying BERT, RoBERTa, and their attributable differences from debiased FairBERTa, IG$^2$ allows us to locate and suppress identified neurons, and further mitigate undesired behaviors. As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost\footnote{This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.}.

ICLR Conference 2024 Conference Paper

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

  • Ming Jin 0005
  • Shiyu Wang 0001
  • Lintao Ma
  • Zhixuan Chu
  • James Y. Zhang
  • Xiaoming Shi 0001
  • Pin-Yu Chen
  • Yuxuan Liang 0002

Time series forecasting holds significant importance in many real-world dynamic systems and has been extensively studied. Unlike natural language process (NLP) and computer vision (CV), where a single large model can tackle multiple tasks, models for time series forecasting are often specialized, necessitating distinct designs for different tasks and applications. While pre-trained foundation models have made impressive strides in NLP and CV, their development in time series domains has been constrained by data sparsity. Recent studies have revealed that large language models (LLMs) possess robust pattern recognition and reasoning abilities over complex sequences of tokens. However, the challenge remains in effectively aligning the modalities of time series data and natural language to leverage these capabilities. In this work, we present Time-LLM, a reprogramming framework to repurpose LLMs for general time series forecasting with the backbone language models kept intact. We begin by reprogramming the input time series with text prototypes before feeding it into the frozen LLM to align the two modalities. To augment the LLM's ability to reason with time series data, we propose Prompt-as-Prefix (PaP), which enriches the input context and directs the transformation of reprogrammed input patches. The transformed time series patches from the LLM are finally projected to obtain the forecasts. Our comprehensive evaluations demonstrate that \method is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models. Moreover, Time-LLM excels in both few-shot and zero-shot learning scenarios. The code is made available at https://github.com/KimMeen/Time-LLM.

TMLR Journal 2024 Journal Article

To Transfer or Not to Transfer: Suppressing Concepts from Source Representations

  • Vijay Sadashivaiah
  • Keerthiram Murugesan
  • Ronny Luss
  • Pin-Yu Chen
  • Chris Sims
  • James Hendler
  • Amit Dhurandhar

With the proliferation of large pre-trained models in various domains, transfer learning has gained prominence where intermediate representations from these models can be leveraged to train better (target) task-specific models, with possibly limited labeled data. Although transfer learning can be beneficial in many applications, it can transfer undesirable information to target tasks that may severely curtail its performance in the target domain or raise ethical concerns related to privacy and/or fairness. In this paper, we propose a novel approach for suppressing the transfer of user-determined semantic concepts (viz. color, glasses, etc.) in intermediate source representations to target tasks without retraining the source model which can otherwise be expensive or even infeasible. Notably, we tackle a bigger challenge in the input data as a given intermediate source representation is biased towards the source task, thus possibly further entangling the desired concepts. We evaluate our approach qualitatively and quantitatively in the visual domain showcasing its efficacy for classification and generative source models. Finally, we provide a concept selection approach that automatically suppresses the undesirable concepts.

ICML Conference 2024 Conference Paper

What Improves the Generalization of Graph Transformers? A Theoretical Dive into the Self-attention and Positional Encoding

  • Hongkang Li
  • Meng Wang 0003
  • Tengfei Ma 0001
  • Sijia Liu 0001
  • Zaixi Zhang
  • Pin-Yu Chen

Graph Transformers, which incorporate self-attention and positional encoding, have recently emerged as a powerful architecture for various graph learning tasks. Despite their impressive performance, the complex non-convex interactions across layers and the recursive graph structure have made it challenging to establish a theoretical foundation for learning and generalization. This study introduces the first theoretical investigation of a shallow Graph Transformer for semi-supervised node classification, comprising a self-attention layer with relative positional encoding and a two-layer perception. Focusing on a graph data model with discriminative nodes that determine node labels and non-discriminative nodes that are class-irrelevant, we characterize the sample complexity required to achieve a desirable generalization error by training with stochastic gradient descent (SGD). This paper provides the quantitative characterization of the sample complexity and number of iterations for convergence dependent on the fraction of discriminative nodes, the dominant patterns, and the initial model errors. Furthermore, we demonstrate that self-attention and positional encoding enhance generalization by making the attention map sparse and promoting the core neighborhood during training, which explains the superior feature representation of Graph Transformers. Our theoretical results are supported by empirical experiments on synthetic and real-world benchmarks.

ICML Conference 2024 Conference Paper

What Would Gauss Say About Representations? Probing Pretrained Image Models using Synthetic Gaussian Benchmarks

  • Ching-Yun Ko
  • Pin-Yu Chen
  • Payel Das
  • Jeet Mohapatra
  • Luca Daniel

Recent years have witnessed a paradigm shift in deep learning from task-centric model design to task-agnostic representation learning and task-specific fine-tuning. Pretrained model representations are commonly evaluated extensively across various real-world tasks and used as a foundation for different downstream tasks. This paper proposes a solution for assessing the quality of representations in a task-agnostic way. To circumvent the need for real-world data in evaluation, we explore the use of synthetic binary classification tasks with Gaussian mixtures to probe pretrained models and compare the robustness-accuracy performance on pretrained representations with an idealized reference. Our approach offers a holistic evaluation, revealing intrinsic model capabilities and reducing the dependency on real-life data for model evaluation. Evaluated with various pretrained image models, the experimental results confirm that our task-agnostic evaluation correlates with actual linear probing performance on downstream tasks and can also guide parameter choice in robust linear probing to achieve a better robustness-accuracy trade-off.

ICLR Conference 2023 Conference Paper

A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity

  • Hongkang Li
  • Meng Wang 0003
  • Sijia Liu 0001
  • Pin-Yu Chen

Vision Transformers (ViTs) with self-attention modules have recently achieved great empirical success in many vision tasks. Due to non-convex interactions across layers, however, the theoretical learning and generalization analysis is mostly elusive. Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a three-layer ViT, i.e., one self-attention layer followed by a two-layer perceptron, for a classification task. We characterize the sample complexity to achieve a zero generalization error. Our sample complexity bound is positively correlated with the inverse of the fraction of label-relevant tokens, the token noise level, and the initial model error. We also prove that a training process using stochastic gradient descent (SGD) leads to a sparse attention map, which is a formal verification of the general intuition about the success of attention. Moreover, this paper indicates that a proper token sparsification can improve the test performance by removing label-irrelevant and/or noisy tokens, including spurious correlations. Empirical experiments on synthetic data and CIFAR-10 dataset justify our theoretical results and generalize to deeper ViTs.

ICML Conference 2023 Conference Paper

Compressed Decentralized Proximal Stochastic Gradient Method for Nonconvex Composite Problems with Heterogeneous Data

  • Yonggui Yan
  • Jie Chen
  • Pin-Yu Chen
  • Xiaodong Cui
  • Songtao Lu
  • Yangyang Xu

We first propose a decentralized proximal stochastic gradient tracking method (DProxSGT) for nonconvex stochastic composite problems, with data heterogeneously distributed on multiple workers in a decentralized connected network. To save communication cost, we then extend DProxSGT to a compressed method by compressing the communicated information. Both methods need only $\mathcal{O}(1)$ samples per worker for each proximal update, which is important to achieve good generalization performance on training deep neural networks. With a smoothness condition on the expected loss function (but not on each sample function), the proposed methods can achieve an optimal sample complexity result to produce a near-stationary point. Numerical experiments on training neural networks demonstrate the significantly better generalization performance of our methods over large-batch training methods and momentum variance-reduction methods and also, the ability of handling heterogeneous data by the gradient tracking scheme.

AAMAS Conference 2023 Conference Paper

Computational Complexity of Verifying the Group No-show Paradox

  • Farhad Mohsin
  • Qishen Han
  • Sikai Ruan
  • Pin-Yu Chen
  • Francesca Rossi
  • Lirong Xia

The (group) no-show paradox refers to the undesirable situation where a group of agents has the incentive to abstain from voting to get a more favorable winner. We examine the computational complexity of verifying whether the group no-show paradox exists given agents’ preferences and the voting rule. We prove that the verification problem is NP-hard to compute for commonly studied voting rules such as Copeland, maximin, single transferable vote, and Black’s rule. We propose integer linear programming-based algorithms and a breadth-first search algorithm for the verification problem. Experimental results illustrate that the former work better for a small number of alternatives, and the latter work better for a small number of agents. Using these algorithms, we observe that the group no-show paradoxes rarely occur in real-world data.

ICLR Conference 2023 Conference Paper

FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning

  • Kaiyuan Zhang 0002
  • Guanhong Tao 0001
  • Qiuling Xu
  • Siyuan Cheng 0005
  • Shengwei An
  • Yingqi Liu
  • Shiwei Feng 0002
  • Guangyu Shen

Federated Learning (FL) is a distributed learning paradigm that enables different parties to train a model together for high quality and strong privacy protection. In this scenario, individual participants may get compromised and perform backdoor attacks by poisoning the data (or gradients). Existing work on robust aggregation and certified FL robustness does not study how hardening benign clients can affect the global model (and the malicious clients). In this work, we theoretically analyze the connection among cross-entropy loss, attack success rate, and clean accuracy in this setting. Moreover, we propose a trigger reverse engineering based defense and show that our method can achieve robustness improvement with guarantee (i.e., reducing the attack success rate) without affecting benign accuracy. We conduct comprehensive experiments across different datasets and attack settings. Our results on nine competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks. Code is available at https://github.com/KaiyuanZh/FLIP.

AAAI Conference 2023 Conference Paper

Holistic Adversarial Robustness of Deep Learning Models

  • Pin-Yu Chen
  • Sijia Liu

Adversarial robustness studies the worst-case performance of a machine learning model to ensure safety and reliability. With the proliferation of deep-learning-based technology, the potential risks associated with model development and deployment can be amplified and become dreadful vulnerabilities. This paper provides a comprehensive overview of research topics and foundational principles of research methods for adversarial robustness of deep learning models, including attacks, defenses, verification, and novel applications.

NeurIPS Conference 2023 Conference Paper

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

  • Chen Chen
  • Yuchen Hu
  • Chao-Han Huck Yang
  • Sabato Marco Siniscalchi
  • Pin-Yu Chen
  • Eng-Siong Chng

Advancements in deep neural networks have allowed automatic speech recognition (ASR) systems to attain human parity on several publicly available clean speech datasets. However, even state-of-the-art ASR systems experience performance degradation when confronted with adverse conditions, as a well-trained acoustic model is sensitive to variations in the speech domain, e. g. , background noise. Intuitively, humans address this issue by relying on their linguistic knowledge: the meaning of ambiguous spoken terms is usually inferred from contextual cues thereby reducing the dependency on the auditory system. Inspired by this observation, we introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction, where N-best decoding hypotheses provide informative elements for true transcription prediction. This approach is a paradigm shift from the traditional language model rescoring strategy that can only select one candidate hypothesis as output transcription. The proposed benchmark contains a novel dataset, "HyPoradise" (HP), encompassing more than 316, 000 pairs of N-best hypotheses and corresponding accurate transcriptions across prevalent speech domains. Given this dataset, we examine three types of error correction techniques based on LLMs with varying amounts of labeled hypotheses-transcription pairs, which gains significant word error rate (WER) reduction. Experimental evidence demonstrates the proposed technique achieves a breakthrough by surpassing the upper bound of traditional re-ranking based methods. More surprisingly, LLM with reasonable prompt design can even correct those tokens that are missing in N-best list. We make our results publicly accessible for reproducible pipelines with released pre-trained models, thus providing a new paradigm for ASR error correction with LLMs.

ICML Conference 2023 Conference Paper

Identification of the Adversary from a Single Adversarial Example

  • Minhao Cheng
  • Rui Min
  • Haochen Sun 0001
  • Pin-Yu Chen

Deep neural networks have been shown vulnerable to adversarial examples. Even though many defense methods have been proposed to enhance the robustness, it is still a long way toward providing an attack-free method to build a trustworthy machine learning system. In this paper, instead of enhancing the robustness, we take the investigator’s perspective and propose a new framework to trace the first compromised model copy in a forensic investigation manner. Specifically, we focus on the following setting: the machine learning service provider provides model copies for a set of customers. However, one of the customers conducted adversarial attacks to fool the system. Therefore, the investigator’s objective is to identify the first compromised copy by collecting and analyzing evidence from only available adversarial examples. To make the tracing viable, we design a random mask watermarking mechanism to differentiate adversarial examples from different copies. First, we propose a tracing approach in the data-limited case where the original example is also available. Then, we design a data-free approach to identify the adversary without accessing the original example. Finally, the effectiveness of our proposed framework is evaluated by extensive experiments with different model architectures, adversarial attacks, and datasets.

ICLR Conference 2023 Conference Paper

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

  • Shuai Zhang 0015
  • Meng Wang 0003
  • Pin-Yu Chen
  • Sijia Liu 0001
  • Songtao Lu
  • Miao Liu 0001

Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include graph sparsification that samples a subgraph to reduce the amount of data aggregation and model sparsification that prunes the neural network to reduce the number of trainable weights. Despite the empirical successes in reducing the training cost while maintaining the test accuracy, the theoretical generalization analysis of sparse learning for GNNs remains elusive. To the best of our knowledge, this paper provides the first theoretical characterization of joint edge-model sparse learning from the perspective of sample complexity and convergence rate in achieving zero generalization error. It proves analytically that both sampling important nodes and pruning neurons with lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy. Although the analysis is centered on two-layer GNNs with structural constraints on data, the insights are applicable to more general setups and justified by both synthetic and practical citation datasets.

IJCAI Conference 2023 Conference Paper

Learning to Design Fair and Private Voting Rules (Extended Abstract)

  • Farhad Mohsin
  • Ao Liu
  • Pin-Yu Chen
  • Francesca Rossi
  • Lirong Xia

Voting is used widely to aggregate preferences to make a collective decision. In this paper, we focus on evaluating and designing voting rules that support both the privacy of the voting agents and a notion of fairness over such agents. First, we introduce a novel notion of group fairness and adopt the existing notion of local differential privacy. We then evaluate the level of group fairness in several existing voting rules, as well as the trade-offs between fairness and privacy, showing that it is not possible to always obtain maximal economic efficiency with high fairness. Then, we present both a machine learning and a constrained optimization approach to design new voting rules that are fair while maintaining a high level of economic efficiency. Finally, we empirically examine the effect of adding noise to create local differentially private voting rules and discuss the three-way trade-off between economic efficiency, fairness, and privacy.

IROS Conference 2023 Conference Paper

MENTOR: Multilingual Text Detection Toward Learning by Analogy

  • Hsin-Ju Lin
  • Tsu-Chun Chung
  • Ching-Chun Hsiao
  • Pin-Yu Chen
  • Wei-Chen Chiu
  • Ching-Chun Huang

Text detection is frequently used in vision-based mobile robots when they need to interpret texts in their surroundings to perform a given task. For instance, delivery robots in multilingual cities need to be capable of doing multilingual text detection so that the robots can read traffic signs and road markings. Moreover, the target languages change from region to region, implying the need of efficiently re-training the models to recognize the novel/new languages. However, collecting and labeling training data for novel languages are cumbersome, and the efforts to re-train an existing/trained text detector are considerable. Even worse, such a routine would repeat whenever a novel language appears. This motivates us to propose a new problem setting for tackling the aforementioned challenges in a more efficient way: “We ask for a generalizable multilingual text detection framework to detect and identify both seen and unseen language regions inside scene images without the requirement of collecting supervised training data for unseen languages as well as model re-training”. To this end, we propose “MENTOR”, the first work to realize a learning strategy between zero-shot learning and few-shot learning for multilingual scene text detection. During the training phase, we leverage the “zero-cost” synthesized printed texts and the available training/seen languages to learn the meta-mapping from printed texts to language-specific kernel weights. Meanwhile, dynamic convolution networks guided by the language-specific kernel are trained to realize a detection-by-feature-matching scheme. In the inference phase, “zero-cost” printed texts are synthesized given a new target language. By utilizing the learned meta-mapping and the matching network, our “MENTOR” can freely identify the text regions of the new language. Experiments show our model can achieve comparable results with supervised methods for seen languages and outperform other methods in detecting unseen languages.

ICML Conference 2023 Conference Paper

MultiRobustBench: Benchmarking Robustness Against Multiple Attacks

  • Sihui Dai
  • Saeed Mahloujifar
  • Chong Xiang 0001
  • Vikash Sehwag
  • Pin-Yu Chen
  • Prateek Mittal

The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded $\ell_p$-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner’s knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench (https: //multirobustbench. github. io), for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including $\ell_p$-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.

AAAI Conference 2023 System Paper

NCTV: Neural Clamping Toolkit and Visualization for Neural Network Calibration

  • Lei Hsiung
  • Yung-Chen Tang
  • Pin-Yu Chen
  • Tsung-Yi Ho

With the advancement of deep learning technology, neural networks have demonstrated their excellent ability to provide accurate predictions in many tasks. However, a lack of consideration for neural network calibration will not gain trust from humans, even for high-accuracy models. In this regard, the gap between the confidence of the model's predictions and the actual correctness likelihood must be bridged to derive a well-calibrated model. In this paper, we introduce the Neural Clamping Toolkit, the first open-source framework designed to help developers employ state-of-the-art model-agnostic calibrated models. Furthermore, we provide animations and interactive sections in the demonstration to familiarize researchers with calibration in neural networks. A Colab tutorial on utilizing our toolkit is also introduced.

NeurIPS Conference 2023 Conference Paper

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $\epsilon$-Greedy Exploration

  • Shuai Zhang
  • Hongkang Li
  • Meng Wang
  • Miao Liu
  • Pin-Yu Chen
  • Songtao Lu
  • Sijia Liu
  • Keerthiram Murugesan

This paper provides a theoretical understanding of deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.

ICML Conference 2023 Conference Paper

Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks

  • Mohammed Nowaz Rabbani Chowdhury
  • Shuai Zhang 0015
  • Meng Wang 0003
  • Sijia Liu 0001
  • Pin-Yu Chen

In deep learning, mixture-of-experts (MoE) activates one or few experts (sub-networks) on a per-sample or per-token basis, resulting in significant computation reduction. The recently proposed patch-level routing in MoE (pMoE) divides each input into $n$ patches (or tokens) and sends $l$ patches ($l\ll n$) to each expert through prioritized routing. pMoE has demonstrated great empirical success in reducing training and inference costs while maintaining test accuracy. However, the theoretical explanation of pMoE and the general MoE remains elusive. Focusing on a supervised classification task using a mixture of two-layer convolutional neural networks (CNNs), we show for the first time that pMoE provably reduces the required number of training samples to achieve desirable generalization (referred to as the sample complexity) by a factor in the polynomial order of $n/l$, and outperforms its single-expert counterpart of the same or even larger capacity. The advantage results from the discriminative routing property, which is justified in both theory and practice that pMoE routers can filter label-irrelevant patches and route similar class-discriminative patches to the same expert. Our experimental results on MNIST, CIFAR-10, and CelebA support our theoretical findings on pMoE’s generalization and show that pMoE can avoid learning spurious correlations.

UAI Conference 2023 Conference Paper

Pessimistic Model Selection for Offline Deep Reinforcement Learning

  • Chao-Han Huck Yang
  • Zhengling Qi
  • Yifan Cui 0001
  • Pin-Yu Chen

Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications. Despite its promising performance, practical gaps exist when deploying DRL in real-world scenarios. One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL. In particular, for offline DRL with observational data, model selection is a challenging task as there is no ground truth available for performance demonstration, in contrast with the online setting with simulated environments. In this work, we propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee, which features a provably effective framework for finding the best policy among a set of candidate models. Two refined approaches are also proposed to address the potential bias of DRL model in identifying the optimal policy. Numerical studies demonstrated the superior performance of our approach over existing methods.

NeurIPS Conference 2023 Conference Paper

RADAR: Robust AI-Text Detection via Adversarial Learning

  • Xiaomeng Hu
  • Pin-Yu Chen
  • Tsung-Yi Ho

Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusations of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a $\underline{r}$obust $\underline{A}$I-text $\underline{d}$etector via $\underline{a}$dversarial lea$\underline{r}$ning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic content to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2. 0, Palmyra, Camel, GPT-J, Dolly 1. 0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3. 5-Turbo.

ICML Conference 2023 Conference Paper

Reprogramming Pretrained Language Models for Antibody Sequence Infilling

  • Igor Melnyk
  • Vijil Chenthamarakshan
  • Pin-Yu Chen
  • Payel Das
  • Amit Dhurandhar
  • Inkit Padhi
  • Devleena Das

Antibodies comprise the most versatile class of binding molecules, with numerous applications in biomedicine. Computational design of antibodies involves generating novel and diverse sequences, while maintaining structural consistency. Unique to antibodies, designing the complementarity-determining region (CDR), which determines the antigen binding affinity and specificity, creates its own unique challenges. Recent deep learning models have shown impressive results, however the limited number of known antibody sequence/structure pairs frequently leads to degraded performance, particularly lacking diversity in the generated sequences. In our work we address this challenge by leveraging Model Reprogramming (MR), which repurposes pretrained models on a source language to adapt to the tasks that are in a different language and have scarce data - where it may be difficult to train a high-performing model from scratch or effectively fine-tune an existing pre-trained model on the specific task. Specifically, we introduce ReprogBert in which a pretrained English language model is repurposed for protein sequence infilling - thus considers cross-language adaptation using less data. Results on antibody design benchmarks show that our model on low-resourced antibody sequence dataset provides highly diverse CDR sequences, up to more than a two-fold increase of diversity over the baselines, without losing structural integrity and naturalness. The generated sequences also demonstrate enhanced antigen binding specificity and virus neutralization ability. Code is available at https: //github. com/IBM/ReprogBERT

NeurIPS Conference 2023 Conference Paper

Uncovering and Quantifying Social Biases in Code Generation

  • Yan Liu
  • Xiaokang Chen
  • Yan Gao
  • Zhe Su
  • Fengji Zhang
  • Daoguang Zan
  • Jian-Guang Lou
  • Pin-Yu Chen

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias.

NeurIPS Conference 2023 Conference Paper

VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion Models

  • Sheng-Yen Chou
  • Pin-Yu Chen
  • Tsung-Yi Ho

Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising. They are the backbone of many generative AI applications, such as text-to-image conditional generation. However, recent studies have shown that basic unconditional DMs (e. g. , DDPM and DDIM) are vulnerable to backdoor injection, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. This paper presents a unified backdoor attack framework (VillanDiffusion) to expand the current scope of backdoor analysis for DMs. Our framework covers mainstream unconditional and conditional DMs (denoising-based and score-based) and various training-free samplers for holistic evaluations. Experiments show that our unified framework facilitates the backdoor analysis of different DM configurations and provides new insights into caption-based backdoor attacks on DMs.

AAAI Conference 2023 Conference Paper

When Neural Networks Fail to Generalize? A Model Sensitivity Perspective

  • Jiajin Zhang
  • Hanqing Chao
  • Amit Dhurandhar
  • Pin-Yu Chen
  • Ali Tajer
  • Yangyang Xu
  • Pingkun Yan

Domain generalization (DG) aims to train a model to perform well in unseen domains under different distributions. This paper considers a more realistic yet more challenging scenario, namely Single Domain Generalization (Single-DG), where only a single source domain is available for training. To tackle this challenge, we first try to understand when neural networks fail to generalize? We empirically ascertain a property of a model that correlates strongly with its generalization that we coin as "model sensitivity". Based on our analysis, we propose a novel strategy of Spectral Adversarial Data Augmentation (SADA) to generate augmented images targeted at the highly sensitive frequencies. Models trained with these hard-to-learn samples can effectively suppress the sensitivity in the frequency space, which leads to improved generalization performance. Extensive experiments on multiple public datasets demonstrate the superiority of our approach, which surpasses the state-of-the-art single-DG methods by up to 2.55%. The source code is available at https://github.com/DIAL-RPI/Spectral-Adversarial-Data-Augmentation.

ICML Conference 2023 Conference Paper

Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

  • Yihao Xue
  • Siddharth Joshi 0004
  • Eric Gan
  • Pin-Yu Chen
  • Baharan Mirzasoleiman

Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. However, supervised CL is prone to collapsing representations of subclasses within a class by not capturing all their features, and unsupervised CL may suppress harder class-relevant features by focusing on learning easy class-irrelevant features; both significantly compromise representation quality. Yet, there is no theoretical understanding of class collapse or feature suppression at test time. We provide the first unified theoretically rigorous framework to determine which features are learnt by CL. Our analysis indicate that, perhaps surprisingly, bias of (stochastic) gradient descent towards finding simpler solutions is a key factor in collapsing subclass representations and suppressing harder class-relevant features. Moreover, we present increasing embedding dimensionality and improving the quality of data augmentations as two theoretically motivated solutions to feature suppression. We also provide the first theoretical explanation for why employing supervised and unsupervised CL together yields higher-quality representations, even when using commonly-used stochastic gradient methods.

AAAI Conference 2022 Conference Paper

Adversarial Examples Can Be Effective Data Augmentation for Unsupervised Machine Learning

  • Chia-Yi Hsu
  • Pin-Yu Chen
  • Songtao Lu
  • Sijia Liu
  • Chia-Mu Yu

Adversarial examples causing evasive predictions are widely used to evaluate and improve the robustness of machine learning models. However, current studies focus on supervised learning tasks, relying on the ground-truth data label, a targeted objective, or supervision from a trained classifier. In this paper, we propose a framework of generating adversarial examples for unsupervised models and demonstrate novel applications to data augmentation. Our framework exploits a mutual information neural estimator as an information-theoretic similarity measure to generate adversarial examples without supervision. We propose a new MinMax algorithm with provable convergence guarantees for efficient generation of unsupervised adversarial examples. Our framework can also be extended to supervised adversarial examples. When using unsupervised adversarial examples as a simple plug-in data augmentation tool for model retraining, significant improvements are consistently observed across different unsupervised tasks and datasets, including data reconstruction, representation learning, and contrastive learning. Our results show novel methods and considerable advantages in studying and improving unsupervised machine learning via adversarial examples.

ICLR Conference 2022 Conference Paper

Auto-Transfer: Learning to Route Transferable Representations

  • Keerthiram Murugesan
  • Vijay Sadashivaiah
  • Ronny Luss
  • Karthikeyan Shanmugam 0001
  • Pin-Yu Chen
  • Amit Dhurandhar

Knowledge transfer between heterogeneous source and target networks and tasks has received a lot of attention in recent times as large amounts of quality labeled data can be difficult to obtain in many applications. Existing approaches typically constrain the target deep neural network (DNN) feature representations to be close to the source DNNs feature representations, which can be limiting. We, in this paper, propose a novel adversarial multi-armed bandit approach that automatically learns to route source representations to appropriate target representations following which they are combined in meaningful ways to produce accurate target models. We see upwards of 5\% accuracy improvements compared with the state-of-the-art knowledge transfer methods on four benchmark (target) image datasets CUB200, Stanford Dogs, MIT67, and Stanford40 where the source dataset is ImageNet. We qualitatively analyze the goodness of our transfer scheme by showing individual examples of the important features focused on by our target network at different layers compared with the (closest) competitors. We also observe that our improvement over other methods is higher for smaller target datasets making it an effective tool for small data applications that may benefit from transfer learning.

IJCAI Conference 2022 Conference Paper

CARBEN: Composite Adversarial Robustness Benchmark

  • Lei Hsiung
  • Yun-Yun Tsai
  • Pin-Yu Chen
  • Tsung-Yi Ho

Prior literature on adversarial attack methods has mainly focused on attacking with and defending against a single threat model, e. g. , perturbations bounded in Lp ball. However, multiple threat models can be combined into composite perturbations. One such approach, composite adversarial attack (CAA), not only expands the perturbable space of the image, but also may be overlooked by current modes of robustness evaluation. This paper demonstrates how CAA's attack order affects the resulting image, and provides real-time inferences of different models, which will facilitate users' configuration of the parameters of the attack level and their rapid evaluation of model prediction. A leaderboard to benchmark adversarial robustness against CAA is also introduced.

IJCAI Conference 2022 Conference Paper

CAT: Customized Adversarial Training for Improved Robustness

  • Minhao Cheng
  • Qi Lei
  • Pin-Yu Chen
  • Inderjit Dhillon
  • Cho-Jui Hsieh

Adversarial training has become one of the most effective methods for improving robustness of neural networks. However, it often suffers from poor generalization on both clean and perturbed data. Current robust training method always use a uniformed perturbation strength for every samples to generate adversarial examples during model training for improving adversarial robustness. However, we show it would lead worse training and generalizaiton error and forcing the prediction to match one-hot label. In this paper, therefore, we propose a new algorithm, named Customized Adversarial Training (CAT), which adaptively customizes the perturbation level and the corresponding label for each training sample in adversarial training. We first show theoretically the CAT scheme improves the generalization. Also, through extensive experiments, we show that the proposed algorithm achieves better clean and robust accuracy than previous adversarial training methods. The full version of this paper is available at https: //arxiv. org/abs/2002. 06789.

UAI Conference 2022 Conference Paper

Distributed adversarial training to robustify deep neural networks at scale

  • Gaoyuan Zhang
  • Songtao Lu
  • Yihua Zhang
  • Xiangyi Chen
  • Pin-Yu Chen
  • Quanfu Fan
  • Lee Martie
  • Lior Horesh

Current deep neural networks (DNNs) are vulnerable to adversarial attacks, where adversarial perturbations to the inputs can change or manipulate classification. To defend against such attacks, an effective and popular approach, known as adversarial training (AT), has been shown to mitigate the negative impact of adversarial attacks by virtue of a min-max robust training method. While effective, it remains unclear whether it can successfully be adapted to the distributed learning context. The power of distributed optimization over multiple machines enables us to scale up robust training over large models and datasets. Spurred by that, we propose distributed adversarial training (DAT), a large-batch adversarial training framework implemented over multiple machines. We show that DAT is general, which supports training over labeled and unlabeled data, multiple types of attack generation methods, and gradient compression operations favored for distributed optimization. Theoretically, we provide, under standard conditions in the optimization theory, the convergence rate of DAT to the first-order stationary points in general non-convex settings. Empirically, we demonstrate that DAT either matches or outperforms state-of-the-art robust accuracies and achieves a graceful training speedup (e. g. , on ResNet-50 under ImageNet). Codes are available at https: //github. com/dat-2022/dat.

ICML Conference 2022 Conference Paper

Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling

  • Hongkang Li
  • Meng Wang 0003
  • Sijia Liu 0001
  • Pin-Yu Chen
  • Jinjun Xiong

Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graph-structured data. To address its scalability issue due to the recursive embedding of neighboring features, graph topology sampling has been proposed to reduce the memory and computational cost of training GCNs, and it has achieved comparable test performance to those without topology sampling in many empirical studies. To the best of our knowledge, this paper provides the first theoretical justification of graph topology sampling in training (up to) three-layer GCNs for semi-supervised node classification. We formally characterize some sufficient conditions on graph topology sampling such that GCN training leads to diminishing generalization error. Moreover, our method tackles the non-convex interaction of weights across layers, which is under-explored in the existing theoretical analyses of GCNs. This paper characterizes the impact of graph structures and topology sampling on the generalization performance and sample complexity explicitly, and the theoretical findings are also justified through numerical experiments.

ICLR Conference 2022 Conference Paper

How unlabeled data improve generalization in self-training? A one-hidden-layer theoretical analysis

  • Shuai Zhang 0015
  • Meng Wang 0003
  • Sijia Liu 0001
  • Pin-Yu Chen
  • Jinjun Xiong

Self-training, a semi-supervised learning algorithm, leverages a large amount of unlabeled data to improve learning when the labeled data are limited. Despite empirical successes, its theoretical characterization remains elusive. To the best of our knowledge, this work establishes the first theoretical analysis for the known iterative self-training paradigm and formally proves the benefits of unlabeled data in both training convergence and generalization ability. To make our theoretical analysis feasible, we focus on the case of one-hidden-layer neural networks. However, theoretical understanding of iterative self-training is non-trivial even for a shallow neural network. One of the key challenges is that existing neural network landscape analysis built upon supervised learning no longer holds in the (semi-supervised) self-training paradigm. We address this challenge and prove that iterative self-training converges linearly with both convergence rate and generalization accuracy improved in the order of $1/\sqrt{M}$, where $M$ is the number of unlabeled samples. Extensive experiments from shallow neural networks to deep neural networks are also provided to justify the correctness of our established theoretical insights on self-training.

JAIR Journal 2022 Journal Article

Learning to Design Fair and Private Voting Rules

  • Farhad Mohsin
  • Ao Liu
  • Pin-Yu Chen
  • Francesca Rossi
  • Lirong Xia

Voting is used widely to identify a collective decision for a group of agents, based on their preferences. In this paper, we focus on evaluating and designing voting rules that support both the privacy of the voting agents and a notion of fairness over such agents. To do this, we introduce a novel notion of group fairness and adopt the existing notion of local differential privacy. We then evaluate the level of group fairness in several existing voting rules, as well as the trade-offs between fairness and privacy, showing that it is not possible to always obtain maximal economic efficiency with high fairness or high privacy levels. Then, we present both a machine learning and a constrained optimization approach to design new voting rules that are fair while maintaining a high level of economic efficiency. Finally, we empirically examine the effect of adding noise to create local differentially private voting rules and discuss the three-way trade-off between economic efficiency, fairness, and privacy. This paper appears in the special track on AI & Society.

ICML Conference 2022 Conference Paper

Linearity Grafting: Relaxed Neuron Pruning Helps Certifiable Robustness

  • Tianlong Chen 0001
  • Huan Zhang 0001
  • Zhenyu Zhang 0015
  • Shiyu Chang
  • Sijia Liu 0001
  • Pin-Yu Chen
  • Zhangyang Wang

Certifiable robustness is a highly desirable property for adopting deep neural networks (DNNs) in safety-critical scenarios, but often demands tedious computations to establish. The main hurdle lies in the massive amount of non-linearity in large DNNs. To trade off the DNN expressiveness (which calls for more non-linearity) and robustness certification scalability (which prefers more linearity), we propose a novel solution to strategically manipulate neurons, by "grafting" appropriate levels of linearity. The core of our proposal is to first linearize insignificant ReLU neurons, to eliminate the non-linear components that are both redundant for DNN performance and harmful to its certification. We then optimize the associated slopes and intercepts of the replaced linear activations for restoring model performance while maintaining certifiability. Hence, typical neuron pruning could be viewed as a special case of grafting a linear function of the fixed zero slopes and intercept, that might overly restrict the network flexibility and sacrifice its performance. Extensive experiments on multiple datasets and network backbones show that our linearity grafting can (1) effectively tighten certified bounds; (2) achieve competitive certifiable robustness without certified robust training (i. e. , over 30% improvements on CIFAR-10 models); and (3) scale up complete verification to large adversarially trained models with 17M parameters. Codes are available at https: //github. com/VITA-Group/Linearity-Grafting.

NeurIPS Conference 2022 Conference Paper

Make an Omelette with Breaking Eggs: Zero-Shot Learning for Novel Attribute Synthesis

  • Yu-Hsuan Li
  • Tzu-Yin Chao
  • Ching-Chun Huang
  • Pin-Yu Chen
  • Wei-Chen Chiu

Most of the existing algorithms for zero-shot classification problems typically rely on the attribute-based semantic relations among categories to realize the classification of novel categories without observing any of their instances. However, training the zero-shot classification models still requires attribute labeling for each class (or even instance) in the training dataset, which is also expensive. To this end, in this paper, we bring up a new problem scenario: ''Can we derive zero-shot learning for novel attribute detectors/classifiers and use them to automatically annotate the dataset for labeling efficiency? '' Basically, given only a small set of detectors that are learned to recognize some manually annotated attributes (i. e. , the seen attributes), we aim to synthesize the detectors of novel attributes in a zero-shot learning manner. Our proposed method, Zero-Shot Learning for Attributes (ZSLA), which is the first of its kind to the best of our knowledge, tackles this new research problem by applying the set operations to first decompose the seen attributes into their basic attributes and then recombine these basic attributes into the novel ones. Extensive experiments are conducted to verify the capacity of our synthesized detectors for accurately capturing the semantics of the novel attributes and show their superior performance in terms of detection and localization compared to other baseline approaches. Moreover, we demonstrate the application of automatic annotation using our synthesized detectors on Caltech-UCSD Birds-200-2011 dataset. Various generalized zero-shot classification algorithms trained upon the dataset re-annotated by ZSLA shows comparable performance with those trained with the manual ground-truth annotations.

ICLR Conference 2022 Conference Paper

MAML is a Noisy Contrastive Learner in Classification

  • Chia-Hsiang Kao
  • Wei-Chen Chiu
  • Pin-Yu Chen

Model-agnostic meta-learning (MAML) is one of the most popular and widely adopted meta-learning algorithms, achieving remarkable success in various learning problems. Yet, with the unique design of nested inner-loop and outer-loop updates, which govern the task-specific and meta-model-centric learning, respectively, the underlying learning objective of MAML remains implicit, impeding a more straightforward understanding of it. In this paper, we provide a new perspective of the working mechanism of MAML. We discover that MAML is analogous to a meta-learner using a supervised contrastive objective in classification. The query features are pulled towards the support features of the same class and against those of different classes. Such contrastiveness is experimentally verified via an analysis based on the cosine similarity. Moreover, we reveal that vanilla MAML has an undesirable interference term originating from the random initialization and the cross-task interaction. We thus propose a simple but effective technique, the zeroing trick, to alleviate the interference. Extensive experiments are conducted on both mini-ImageNet and Omniglot datasets to validate the consistent improvement brought by our proposed method.

TMLR Journal 2022 Journal Article

On the Adversarial Robustness of Vision Transformers

  • Rulin Shao
  • Zhouxing Shi
  • Jinfeng Yi
  • Pin-Yu Chen
  • Cho-Jui Hsieh

Following the success in advancing natural language processing and understanding, transformers are expected to bring revolutionary changes to computer vision. This work provides a comprehensive study on the robustness of vision transformers (ViTs) against adversarial perturbations. Tested on various white-box and transfer attack settings, we find that ViTs possess better adversarial robustness when compared with MLP-Mixer and convolutional neural networks (CNNs) including ConvNeXt, and this observation also holds for certified robustness. Through frequency analysis and feature visualization, we summarize the following main observations contributing to the improved robustness of ViTs: 1) Features learned by ViTs contain less high-frequency patterns that have spurious correlation, which helps explain why ViTs are less sensitive to high-frequency perturbations than CNNs and MLP-Mixer, and there is a high correlation between how much the model learns high-frequency features and its robustness against different frequency-based perturbations. 2) Introducing convolutional or tokens-to-token blocks for learning high-frequency features in ViTs can improve classification accuracy but at the cost of adversarial robustness. 3) Modern CNN designs that borrow techniques from ViTs including activation function, layer norm, larger kernel size to imitate the global attention, and patchify the images as inputs, etc., could help bridge the performance gap between ViTs and CNNs not only in terms of performance, but also certified and empirical adversarial robustness. Moreover, we show adversarial training is also applicable to ViT for training robust models, and sharpness-aware minimization can also help improve robustness, while pre-training with clean images on larger datasets does not significantly improve adversarial robustness.

ICML Conference 2022 Conference Paper

Revisiting Contrastive Learning through the Lens of Neighborhood Component Analysis: an Integrated Framework

  • Ching-Yun Ko
  • Jeet Mohapatra
  • Sijia Liu 0001
  • Pin-Yu Chen
  • Luca Daniel
  • Lily Weng

As a seminal tool in self-supervised representation learning, contrastive learning has gained unprecedented attention in recent years. In essence, contrastive learning aims to leverage pairs of positive and negative samples for representation learning, which relates to exploiting neighborhood information in a feature space. By investigating the connection between contrastive learning and neighborhood component analysis (NCA), we provide a novel stochastic nearest neighbor viewpoint of contrastive learning and subsequently propose a series of contrastive losses that outperform the existing ones. Under our proposed framework, we show a new methodology to design integrated contrastive losses that could simultaneously achieve good accuracy and robustness on downstream tasks. With the integrated framework, we achieve up to 6% improvement on the standard accuracy and 17% improvement on the robust accuracy.

AAAI Conference 2022 System Paper

SenSE: A Toolkit for Semantic Change Exploration via Word Embedding Alignment

  • Maurício Gruppi
  • Sibel Adalı
  • Pin-Yu Chen

Lexical Semantic Change (LSC) detection, also known as Semantic Shift, is the process of identifying and characterizing variations in language usage across different scenarios such as time and domain. It allows us to track the evolution of word senses, as well as to understand the difference between the languages used by distinct communities. LSC detection is often done by applying a distance measure over vectors of two aligned word embedding matrices. In this demonstration, we present SenSE, an interactive semantic shift exploration toolkit that provides visualization and explanation of lexical semantic change for an input pair of text sources. Our system focuses on showing how the different alignment strategies may affect the output of an LSC model as well as on explaining semantic change based on the neighbors of a chosen target word, while also extracting examples of sentences where these semantic deviations appear. The system runs as a web application (available at http: //sense. mgruppi. me), allowing the audience to interact by configuring the alignment strategies while visualizing the results in a web browser.

ICML Conference 2022 Conference Paper

Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning

  • Momin Abbas
  • Quan Xiao
  • Lisha Chen
  • Pin-Yu Chen
  • Tianyi Chen

Model-agnostic meta learning (MAML) is currently one of the dominating approaches for few-shot meta-learning. Albeit its effectiveness, the optimization of MAML can be challenging due to the innate bilevel problem structure. Specifically, the loss landscape of MAML is much more complex with possibly more saddle points and local minimizers than its empirical risk minimization counterpart. To address this challenge, we leverage the recently invented sharpness-aware minimization and develop a sharpness-aware MAML approach that we term Sharp-MAML. We empirically demonstrate that Sharp-MAML and its computation-efficient variant can outperform the plain-vanilla MAML baseline (e. g. , +3% accuracy on Mini-Imagenet). We complement the empirical study with the convergence rate analysis and the generalization bound of Sharp-MAML. To the best of our knowledge, this is the first empirical and theoretical study on sharpness-aware minimization in the context of bilevel learning.

IJCAI Conference 2022 Conference Paper

Towards Creativity Characterization of Generative Models via Group-Based Subset Scanning

  • Celia Cintas
  • Payel Das
  • Brian Quanz
  • Girmaw Abebe Tadesse
  • Skyler Speakman
  • Pin-Yu Chen

Deep generative models, such as Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs), have been employed widely in computational creativity research. However, such models discourage out-of-distribution generation to avoid spurious sample generation, thereby limiting their creativity. Thus, incorporating research on human creativity into generative deep learning techniques presents an opportunity to make their outputs more compelling and human-like. As we see the emergence of generative models directed toward creativity research, a need for machine learning-based surrogate metrics to characterize creative output from these models is imperative. We propose group-based subset scanning to identify, quantify, and characterize creative processes by detecting a subset of anomalous node-activations in the hidden layers of the generative models. Our experiments on the standard image benchmarks and their ``creatively generated'' variants reveal that the proposed subset scores distribution is more useful for detecting novelty in creative processes in the activation space rather than the pixel space. Further, we found that creative samples generate larger subsets of anomalies than normal or non-creative samples across datasets. The node activations highlighted during the creative decoding process are different from those responsible for the normal sample generation. Lastly, we assess if the images from the subsets selected by our method were also found creative by human evaluators, presenting a link between creativity perception in humans and node activations within deep neural nets.

AAAI Conference 2022 Conference Paper

Training a Resilient Q-network against Observational Interference

  • Chao-Han Huck Yang
  • I-Te Danny Hung
  • Yi Ouyang
  • Pin-Yu Chen

Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications. In practice, however, a DRL agent may receive faulty observation by abrupt interferences such as black-out, frozen-screen, and adversarial perturbation. How to design a resilient DRL algorithm against these rare but mission-critical and safety-crucial scenarios is an essential yet challenging task. In this paper, we consider a deep q-network (DQN) framework training with an auxiliary task of observational interferences such as artificial noises. Inspired by causal inference for observational interference, we propose a causal inference based DQN algorithm called causal inference Q-network (CIQ). We evaluate the performance of CIQ in several benchmark DQN environments with different types of interferences as auxiliary labels. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.

AAAI Conference 2022 Conference Paper

Vision Transformers Are Robust Learners

  • Sayak Paul
  • Pin-Yu Chen

Transformers, composed of multiple self-attention layers, hold strong promises toward a generic learning primitive applicable to different data modalities, including the recent breakthroughs in computer vision achieving state-of-the-art (SOTA) standard accuracy. What remains largely unexplored is their robustness evaluation and attribution. In this work, we study the robustness of the Vision Transformer (ViT) against common corruptions and perturbations, distribution shifts, and natural adversarial examples. We use six different diverse ImageNet datasets concerning robust classification to conduct a comprehensive performance comparison of ViT models and SOTA convolutional neural networks (CNNs), Big-Transfer. Through a series of six systematically designed experiments, we then present analyses that provide both quantitative and qualitative indications to explain why ViTs are indeed more robust learners. For example, with fewer parameters and similar dataset and pre-training combinations, ViT gives a top-1 accuracy of 28. 10% on ImageNet-A which is 4. 3x higher than a comparable variant of BiT. Our analyses on image masking, Fourier spectrum sensitivity, and spread on discrete cosine energy spectrum reveal intriguing properties of ViT attributing to improved robustness. Code for reproducing our experiments is available at https: //git. io/J3VO0.

AAAI Conference 2022 Conference Paper

Zeroth-Order Optimization for Composite Problems with Functional Constraints

  • Zichong Li
  • Pin-Yu Chen
  • Sijia Liu
  • Songtao Lu
  • Yangyang Xu

In many real-world problems, first-order (FO) derivative evaluations are too expensive or even inaccessible. For solving these problems, zeroth-order (ZO) methods that only need function evaluations are often more efficient than FO methods or sometimes the only options. In this paper, we propose a novel zeroth-order inexact augmented Lagrangian method (ZO-iALM) to solve black-box optimization problems, which involve a composite (i. e. , smooth+nonsmooth) objective and functional constraints. This appears to be the first work that develops an iALM-based ZO method for functional constrained optimization and meanwhile achieves query complexity results matching the best-known FO complexity results up to a factor of variable dimension. With an extensive experimental study, we show the effectiveness of our method. The applications of our method span from classical optimization problems to practical machine learning examples such as resource allocation in sensor networks and adversarial example generation.

NeurIPS Conference 2021 Conference Paper

Adversarial Attack Generation Empowered by Min-Max Optimization

  • Jingkang Wang
  • Tianyun Zhang
  • Sijia Liu
  • Pin-Yu Chen
  • Jiacen Xu
  • Makan Fardad
  • Bo Li

The worst-case training principle that minimizes the maximal adversarial loss, also known as adversarial training (AT), has shown to be a state-of-the-art approach for enhancing adversarial robustness. Nevertheless, min-max optimization beyond the purpose of AT has not been rigorously explored in the adversarial context. In this paper, we show how a general notion of min-max optimization over multiple domains can be leveraged to the design of different types of adversarial attacks. In particular, given a set of risk sources, minimizing the worst-case attack loss can be reformulated as a min-max problem by introducing domain weights that are maximized over the probability simplex of the domain set. We showcase this unified framework in three attack generation problems -- attacking model ensembles, devising universal perturbation under multiple inputs, and crafting attacks resilient to data transformations. Extensive experiments demonstrate that our approach leads to substantial attack improvement over the existing heuristic strategies as well as robustness improvement over state-of-the-art defense methods against multiple perturbation types. Furthermore, we find that the self-adjusted domain weights learned from min-max optimization can provide a holistic tool to explain the difficulty level of attack across domains.

NeurIPS Conference 2021 Conference Paper

CAFE: Catastrophic Data Leakage in Vertical Federated Learning

  • Xiao Jin
  • Pin-Yu Chen
  • Chia-Yi Hsu
  • Chia-Mu Yu
  • Tianyi Chen

Recent studies show that private training data can be leaked through the gradients sharing mechanism deployed in distributed machine learning systems, such as federated learning (FL). Increasing batch size to complicate data recovery is often viewed as a promising defense strategy against data leakage. In this paper, we revisit this defense premise and propose an advanced data leakage attack with theoretical justification to efficiently recover batch data from the shared aggregated gradients. We name our proposed method as catastrophic data leakage in vertical federated learning (CAFE). Comparing to existing data leakage attacks, our extensive experimental results on vertical FL settings demonstrate the effectiveness of CAFE to perform large-batch data leakage attack with improved data recovery quality. We also propose a practical countermeasure to mitigate CAFE. Our results suggest that private data participated in standard FL, especially the vertical case, have a high risk of being leaked from the training gradients. Our analysis implies unprecedented and practical data leakage risks in those learning settings. The code of our work is available at https: //github. com/DeRafael/CAFE.

IJCAI Conference 2021 Conference Paper

Characteristic Examples: High-Robustness, Low-Transferability Fingerprinting of Neural Networks

  • Siyue Wang
  • Xiao Wang
  • Pin-Yu Chen
  • Pu Zhao
  • Xue Lin

This paper proposes Characteristic Examples for effectively fingerprinting deep neural networks, featuring high-robustness to the base model against model pruning as well as low-transferability to unassociated models. This is the first work taking both robustness and transferability into consideration for generating realistic fingerprints, whereas current methods lack practical assumptions and may incur large false positive rates. To achieve better trade-off between robustness and transferability, we propose three kinds of characteristic examples: vanilla C-examples, RC-examples, and LTRC-example, to derive fingerprints from the original base model. To fairly characterize the trade-off between robustness and transferability, we propose Uniqueness Score, a comprehensive metric that measures the difference between robustness and transferability, which also serves as an indicator to the false alarm problem. Extensive experiments demonstrate that the proposed characteristic examples can achieve superior performance when compared with existing fingerprinting methods. In particular, for VGG ImageNet models, using LTRC-examples gives 4X higher uniqueness score than the baseline method and does not incur any false positives.

ICML Conference 2021 Conference Paper

CRFL: Certifiably Robust Federated Learning against Backdoor Attacks

  • Chulin Xie
  • Minghao Chen 0001
  • Pin-Yu Chen
  • Bo Li 0026

Federated Learning (FL) as a distributed learning paradigm that aggregates information from diverse clients to train a shared global model, has demonstrated great success. However, malicious clients can perform poisoning attacks and model replacement to introduce backdoors into the trained global model. Although there have been intensive studies designing robust aggregation methods and empirical robust federated training protocols against backdoors, existing approaches lack robustness certification. This paper provides the first general framework, Certifiably Robust Federated Learning (CRFL), to train certifiably robust FL models against backdoors. Our method exploits clipping and smoothing on model parameters to control the global model smoothness, which yields a sample-wise robustness certification on backdoors with limited magnitude. Our certification also specifies the relation to federated learning parameters, such as poisoning ratio on instance level, number of attackers, and training iterations. Practically, we conduct comprehensive experiments across a range of federated datasets, and provide the first benchmark for certified robustness against backdoor attacks in federated learning. Our code is publicaly available at https: //github. com/AI-secure/CRFL.

AAAI Conference 2021 Conference Paper

Curse or Redemption? How Data Heterogeneity Affects the Robustness of Federated Learning

  • Syed Zawad
  • Ahsan Ali
  • Pin-Yu Chen
  • Ali Anwar
  • Yi Zhou
  • Nathalie Baracaldo
  • Yuan Tian
  • Feng Yan

Data heterogeneity has been identified as one of the key features in federated learning but often overlooked in the lens of robustness to adversarial attacks. This paper focuses on characterizing and understanding its impact on backdooring attacks in federated learning through comprehensive experiments using synthetic and the LEAF benchmarks. The initial impression driven by our experimental results suggests that data heterogeneity is the dominant factor in the effectiveness of attacks and it may be a redemption for defending against backdooring as it makes the attack less efficient, more challenging to design effective attack strategies, and the attack result also becomes less predictable. However, with further investigations, we found data heterogeneity is more of a curse than a redemption as the attack effectiveness can be significantly boosted by simply adjusting the client-side backdooring timing. More importantly, data heterogeneity may result in overfitting at the local training of benign clients, which can be utilized by attackers to disguise themselves and fool skewed-feature based defenses. In addition, effective attack strategies can be made by adjusting attack data distribution. Finally, we discuss the potential directions of defending the curses brought by data heterogeneity. The results and lessons learned from our extensive experiments and analysis offer new insights for designing robust federated learning methods and systems.

AAAI Conference 2021 Conference Paper

Fake it Till You Make it: Self-Supervised Semantic Shifts for Monolingual Word Embedding Tasks

  • Maurício Gruppi
  • Pin-Yu Chen
  • Sibel Adali

The use of language is subject to variation over time as well as across social groups and knowledge domains, leading to differences even in the monolingual scenario. Such variation in word usage is often called lexical semantic change (LSC). The goal of LSC is to characterize and quantify language variations with respect to word meaning, to measure how distinct two language sources are (that is, people or language models). Because there is hardly any data available for such a task, most solutions involve unsupervised methods to align two embeddings and predict semantic change with respect to a distance measure. To that end, we propose a self-supervised approach to model lexical semantic change by generating training samples by introducing perturbations of word vectors in the input corpora. We show that our method can be used for the detection of semantic change with any alignment method. Furthermore, it can be used to choose the landmark words to use in alignment and can lead to substantial improvements over the existing techniques for alignment. We illustrate the utility of our techniques using experimental results on three different datasets, involving words with the same or different meanings. Our methods not only provide significant improvements but also can lead to novel findings for the LSC problem.

AAAI Conference 2021 Conference Paper

Fast Training of Provably Robust Neural Networks by SingleProp

  • Akhilan Boopathy
  • Lily Weng
  • Sijia Liu
  • Pin-Yu Chen
  • Gaoyuan Zhang
  • Luca Daniel

Recent works have developed several methods of defending neural networks against adversarial attacks with certified guarantees. However, these techniques can be computationally costly due to the use of certification during training. We develop a new regularizer that is both more efficient than existing certified defenses, requiring only one additional forward propagation through a network, and can be used to train networks with similar certified accuracy. Through experiments on MNIST and CIFAR-10 we demonstrate improvements in training speed and comparable certified accuracy compared to state-of-the-art certified defenses.

ICML Conference 2021 Conference Paper

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

  • Yue Cao 0010
  • Payel Das
  • Vijil Chenthamarakshan
  • Pin-Yu Chen
  • Igor Melnyk
  • Yang Shen 0001

Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering. Challenges exist due to the complex sequence–fold relationship, as well as the difficulties to capture the diversity of the sequences (therefore structures and functions) within a fold. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific target fold. To model the complex sequence–structure relationship, Fold2Seq jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. On test sets with single, high-resolution and complete structure inputs for individual folds, our experiments demonstrate improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design, when compared to existing state-of-the-art methods that include data-driven deep generative models and physics-based RosettaDesign. The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges originating from low-quality, incomplete, or ambiguous input structures. Source code and data are available at https: //github. com/IBM/fold2seq.

NeurIPS Conference 2021 Conference Paper

Formalizing Generalization and Adversarial Robustness of Neural Networks to Weight Perturbations

  • Yu-Lin Tsai
  • Chia-Yi Hsu
  • Chia-Mu Yu
  • Pin-Yu Chen

Studying the sensitivity of weight perturbation in neural networks and its impacts on model performance, including generalization and robustness, is an active research topic due to its implications on a wide range of machine learning tasks such as model compression, generalization gap assessment, and adversarial attacks. In this paper, we provide the first integral study and analysis for feed-forward neural networks in terms of the robustness in pairwise class margin and its generalization behavior under weight perturbation. We further design a new theory-driven loss function for training generalizable and robust neural networks against weight perturbations. Empirical experiments are conducted to validate our theoretical analysis. Our results offer fundamental insights for characterizing the generalization and robustness of neural networks against weight perturbations.

NeurIPS Conference 2021 Conference Paper

Mean-based Best Arm Identification in Stochastic Bandits under Reward Contamination

  • Arpan Mukherjee
  • Ali Tajer
  • Pin-Yu Chen
  • Payel Das

This paper investigates the problem of best arm identification in {\sl contaminated} stochastic multi-arm bandits. In this setting, the rewards obtained from any arm are replaced by samples from an adversarial model with probability $\varepsilon$. A fixed confidence (infinite-horizon) setting is considered, where the goal of the learner is to identify the arm with the largest mean. Owing to the adversarial contamination of the rewards, each arm's mean is only partially identifiable. This paper proposes two algorithms, a gap-based algorithm and one based on the successive elimination, for best arm identification in sub-Gaussian bandits. These algorithms involve mean estimates that achieve the optimal error guarantee on the deviation of the true mean from the estimate asymptotically. Furthermore, these algorithms asymptotically achieve the optimal sample complexity. Specifically, for the gap-based algorithm, the sample complexity is asymptotically optimal up to constant factors, while for the successive elimination-based algorithm, it is optimal up to logarithmic factors. Finally, numerical experiments are provided to illustrate the gains of the algorithms compared to the existing baselines.

ICLR Conference 2021 Conference Paper

On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning

  • Ren Wang 0008
  • Kaidi Xu
  • Sijia Liu 0001
  • Pin-Yu Chen
  • Tsui-Wei Weng
  • Chuang Gan 0001
  • Meng Wang 0003

Model-agnostic meta-learning (MAML) has emerged as one of the most successful meta-learning techniques in few-shot learning. It enables us to learn a $\textit{meta-initialization}$ of model parameters (that we call $\textit{meta-model}$) to rapidly adapt to new tasks using a small amount of labeled training data. Despite the generalization power of the meta-model, it remains elusive that how $\textit{adversarial robustness}$ can be maintained by MAML in few-shot learning. In addition to generalization, robustness is also desired for a meta-model to defend adversarial examples (attacks). Toward promoting adversarial robustness in MAML, we first study $\textit{when}$ a robustness-promoting regularization should be incorporated, given the fact that MAML adopts a bi-level (fine-tuning vs. meta-update) learning procedure. We show that robustifying the meta-update stage is sufficient to make robustness adapted to the task-specific fine-tuning stage even if the latter uses a standard training protocol. We also make additional justification on the acquired robustness adaptation by peering into the interpretability of neurons' activation maps. Furthermore, we investigate $\textit{how}$ robust regularization can $\textit{efficiently}$ be designed in MAML. We propose a general but easily-optimized robustness-regularized meta-learning framework, which allows the use of unlabeled data augmentation, fast adversarial attack generation, and computationally-light fine-tuning. In particular, we for the first time show that the auxiliary contrastive learning task can enhance the adversarial robustness of MAML. Finally, extensive experiments are conducted to demonstrate the effectiveness of our proposed methods in robust few-shot learning.

NeurIPS Conference 2021 Conference Paper

Predicting Deep Neural Network Generalization with Perturbation Response Curves

  • Yair Schiff
  • Brian Quanz
  • Payel Das
  • Pin-Yu Chen

The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of prediction tasks. However, despite these successes, the recent Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition suggests that there is a need for more robust and efficient measures of network generalization. In this work, we propose a new framework for evaluating the generalization capabilities of trained networks. We use perturbation response (PR) curves that capture the accuracy change of a given network as a function of varying levels of training sample perturbation. From these PR curves, we derive novel statistics that capture generalization capability. Specifically, we introduce two new measures for accurately predicting generalization gaps: the Gi-score and Pal-score, which are inspired by the Gini coefficient and Palma ratio (measures of income inequality), that accurately predict generalization gaps. Using our framework applied to intra and inter-class sample mixup, we attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the PGDL competition. In addition, we show that our framework and the proposed statistics can be used to capture to what extent a trained network is invariant to a given parametric input transformation, such as rotation or translation. Therefore, these generalization gap prediction statistics also provide a useful means for selecting optimal network architectures and hyperparameters that are invariant to a certain perturbation.

AAAI Conference 2021 Conference Paper

Self-Progressing Robust Training

  • Minhao Cheng
  • Pin-Yu Chen
  • Sijia Liu
  • Shiyu Chang
  • Cho-Jui Hsieh
  • Payel Das

Enhancing model robustness under new and even adversarial environments is a crucial milestone toward building trustworthy machine learning systems. Current robust training methods such as adversarial training explicitly uses an “attack” (e. g. , L-inf-norm bounded perturbation) to generate adversarial examples during model training for improving adversarial robustness. In this paper, we take a different perspective and propose a new framework called SPROUT, selfprogressing robust training. During model training, SPROUT progressively adjusts training label distribution via our proposed parametrized label smoothing technique, making training free of attack generation and more scalable. We also motivate SPROUT using a general formulation based on vicinity risk minimization, which includes many robust training methods as special cases. Compared with state-of-the-art adversarial training methods (PGD-L-inf and TRADES) under L-infnorm bounded attacks and various invariance tests, SPROUT consistently attains superior performance and is more scalable to large neural networks. Our results shed new light on scalable, effective and attack-independent robust training methods.

NeurIPS Conference 2021 Conference Paper

Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning

  • Akshay Mehra
  • Bhavya Kailkhura
  • Pin-Yu Chen
  • Jihun Hamm

Unsupervised domain adaptation (UDA) enables cross-domain learning without target domain labels by transferring knowledge from a labeled source domain whose distribution differs from that of the target. However, UDA is not always successful and several accounts of `negative transfer' have been reported in the literature. In this work, we prove a simple lower bound on the target domain error that complements the existing upper bound. Our bound shows the insufficiency of minimizing source domain error and marginal distribution mismatch for a guaranteed reduction in the target domain error, due to the possible increase of induced labeling function mismatch. This insufficiency is further illustrated through simple distributions for which the same UDA approach succeeds, fails, and may succeed or fail with an equal chance. Motivated from this, we propose novel data poisoning attacks to fool UDA methods into learning representations that produce large target domain errors. We evaluate the effect of these attacks on popular UDA methods using benchmark datasets where they have been previously shown to be successful. Our results show that poisoning can significantly decrease the target domain accuracy, dropping it to almost 0% in some cases, with the addition of only 10% poisoned data in the source domain. The failure of these UDA methods demonstrates their limitations at guaranteeing cross-domain generalization consistent with our lower bound. Thus, evaluating UDA methods in adversarial settings such as data poisoning provides a better sense of their robustness to data distributions unfavorable for UDA.

ICML Conference 2021 Conference Paper

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

  • Chao-Han Huck Yang
  • Yun-Yun Tsai
  • Pin-Yu Chen

Learning to classify time series with limited data is a practical yet challenging problem. Current methods are primarily based on hand-designed feature extraction rules or domain-specific data augmentation. Motivated by the advances in deep speech processing models and the fact that voice data are univariate temporal signals, in this paper we propose Voice2Serie (V2S), a novel end-to-end approach that reprograms acoustic models for time series classification, through input transformation learning and output label mapping. Leveraging the representation learning power of a large-scale pre-trained speech processing model, on 31 different time series tasks we show that V2S outperforms or is on part with state-of-the-art methods on 22 tasks, and improves their average accuracy by 1. 72%. We further provide theoretical justification of V2S by proving its population risk is upper bounded by the source risk and a Wasserstein distance accounting for feature alignment via reprogramming. Our results offer new and effective means to time series classification.

NeurIPS Conference 2021 Conference Paper

When does Contrastive Learning Preserve Adversarial Robustness from Pretraining to Finetuning?

  • Lijie Fan
  • Sijia Liu
  • Pin-Yu Chen
  • Gaoyuan Zhang
  • Chuang Gan

Contrastive learning (CL) can learn generalizable feature representations and achieve state-of-the-art performance of downstream tasks by finetuning a linear classifier on top of it. However, as adversarial robustness becomes vital in image classification, it remains unclear whether or not CL is able to preserve robustness to downstream tasks. The main challenge is that in the self-supervised pretraining + supervised finetuning paradigm, adversarial robustness is easily forgotten due to a learning task mismatch from pretraining to finetuning. We call such challenge 'cross-task robustness transferability'. To address the above problem, in this paper we revisit and advance CL principles through the lens of robustness enhancement. We show that (1) the design of contrastive views matters: High-frequency components of images are beneficial to improving model robustness; (2) Augmenting CL with pseudo-supervision stimulus (e. g. , resorting to feature clustering) helps preserve robustness without forgetting. Equipped with our new designs, we propose AdvCL, a novel adversarial contrastive pretraining framework. We show that AdvCL is able to enhance cross-task robustness transferability without loss of model accuracy and finetuning efficiency. With a thorough experimental study, we demonstrate that AdvCL outperforms the state-of-the-art self-supervised robust learning methods across multiple datasets (CIFAR-10, CIFAR-100, and STL-10) and finetuning schemes (linear evaluation and full model finetuning).

NeurIPS Conference 2021 Conference Paper

Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Sparse Neural Networks

  • Shuai Zhang
  • Meng Wang
  • Sijia Liu
  • Pin-Yu Chen
  • Jinjun Xiong

The lottery ticket hypothesis (LTH) states that learning on a properly pruned network (the winning ticket) has improved test accuracy over the original unpruned network. Although LTH has been justified empirically in a broad range of deep neural network (DNN) involved applications like computer vision and natural language processing, the theoretical validation of the improved generalization of a winning ticket remains elusive. To the best of our knowledge, our work, for the first time, characterizes the performance of training a pruned neural network by analyzing the geometric structure of the objective function and the sample complexity to achieve zero generalization error. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned, indicating the structural importance of a winning ticket. Moreover, as the algorithm for training a pruned neural network is specified as an (accelerated) stochastic gradient descent algorithm, we theoretically show that the number of samples required for achieving zero generalization error is proportional to the number of the non-pruned weights in the hidden layer. With a fixed number of samples, training a pruned neural network enjoys a faster convergence rate to the desired model than training the original unpruned one, providing a formal justification of the improved generalization of the winning ticket. Our theoretical results are acquired from learning a pruned neural network of one hidden layer, while experimental results are further provided to justify the implications in pruning multi-layer neural networks.

JMLR Journal 2020 Journal Article

AI Explainability 360: An Extensible Toolkit for Understanding Data and Machine Learning Models

  • Vijay Arya
  • Rachel K. E. Bellamy
  • Pin-Yu Chen
  • Amit Dhurandhar
  • Michael Hind
  • Samuel C. Hoffman
  • Stephanie Houde
  • Q. Vera Liao

As artificial intelligence algorithms make further inroads in high-stakes societal applications, there are increasing calls from multiple stakeholders for these algorithms to explain their outputs. To make matters more challenging, different personas of consumers of explanations have different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360, an open-source Python toolkit featuring ten diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of interpretation and explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. The toolkit is not only the software, but also guidance material, tutorials, and an interactive web demo to introduce AI explainability to different audiences. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed. [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2020. ( edit, beta )

ICLR Conference 2020 Conference Paper

Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness

  • Pu Zhao 0001
  • Pin-Yu Chen
  • Payel Das
  • Karthikeyan Natesan Ramamurthy
  • Xue Lin 0001

Mode connectivity provides novel geometric insights on analyzing loss landscapes and enables building high-accuracy pathways between well-trained neural networks. In this work, we propose to employ mode connectivity in loss landscapes to study the adversarial robustness of deep neural networks, and provide novel methods for improving this robustness. Our experiments cover various types of adversarial attacks applied to different network architectures and datasets. When network models are tampered with backdoor or error-injection attacks, our results demonstrate that the path connection learned using limited amount of bonafide data can effectively mitigate adversarial effects while maintaining the original accuracy on clean data. Therefore, mode connectivity provides users with the power to repair backdoored or error-injected models. We also use mode connectivity to investigate the loss landscapes of regular and robust models against evasion attacks. Experiments show that there exists a barrier in adversarial robustness loss on the path connecting regular and adversarially-trained models. A high correlation is observed between the adversarial robustness loss and the largest eigenvalue of the input Hessian matrix, for which theoretical justifications are provided. Our results suggest that mode connectivity offers a holistic tool and practical means for evaluating and improving adversarial robustness.

ICLR Conference 2020 Conference Paper

DBA: Distributed Backdoor Attacks against Federated Learning

  • Chulin Xie
  • Keli Huang
  • Pin-Yu Chen
  • Bo Li 0026

Backdoor attacks aim to manipulate a subset of training data by injecting adversarial triggers such that machine learning models trained on the tampered dataset will make arbitrarily (targeted) incorrect prediction on the testset with the same trigger embedded. While federated learning (FL) is capable of aggregating information provided by different parties for training a better model, its distributed learning methodology and inherently heterogeneous data distribution across parties may bring new vulnerabilities. In addition to recent centralized backdoor attacks on FL where each party embeds the same global trigger during training, we propose the distributed backdoor attack (DBA) --- a novel threat assessment framework developed by fully exploiting the distributed nature of FL. DBA decomposes a global trigger pattern into separate local patterns and embed them into the training set of different adversarial parties respectively. Compared to standard centralized backdoors, we show that DBA is substantially more persistent and stealthy against FL on diverse datasets such as finance and image data. We conduct extensive experiments to show that the attack success rate of DBA is significantly higher than centralized backdoors under different settings. Moreover, we find that distributed attacks are indeed more insidious, as DBA can evade two state-of-the-art robust FL algorithms against centralized backdoors. We also provide explanations for the effectiveness of DBA via feature visual interpretation and feature importance ranking. To further explore the properties of DBA, we test the attack performance by varying different trigger factors, including local trigger variations (size, gap, and location), scaling factor in FL, data distribution, and poison ratio and interval. Our proposed DBA and thorough evaluation results shed lights on characterizing the robustness of FL.

ICML Conference 2020 Conference Paper

Fast Learning of Graph Neural Networks with Guaranteed Generalizability: One-hidden-layer Case

  • Shuai Zhang 0015
  • Meng Wang 0003
  • Sijia Liu 0001
  • Pin-Yu Chen
  • Jinjun Xiong

Although graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice, their theoretical guarantee on generalizability remains elusive in the literature. In this paper, we provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems. Under the assumption that there exists a ground-truth GNN model (with zero generalization error), the objective of GNN learning is to estimate the ground-truth GNN parameters from the training data. To achieve this objective, we propose a learning algorithm that is built on tensor initialization and accelerated gradient descent. We then show that the proposed learning algorithm converges to the ground-truth GNN model for the regression problem, and to a model sufficiently close to the ground-truth for the binary classification problem. Moreover, for both cases, the convergence rate of the proposed learning algorithm is proven to be linear and faster than the vanilla gradient descent algorithm. We further explore the relationship between the sample complexity of GNNs and their underlying graph properties. Lastly, we provide numerical experiments to demonstrate the validity of our analysis and the effectiveness of the proposed learning algorithm for GNNs.

YNIMG Journal 2020 Journal Article

Fluid intelligence is associated with cortical volume and white matter tract integrity within multiple-demand system across adult lifespan

  • Pin-Yu Chen
  • Chang-Le Chen
  • Yung-Chin Hsu
  • Wen-Yih I. Tseng

Background Fluid intelligence (Gf) is the innate ability of an individual to respond to complex and unexpected situations. Although some studies have considered that the multiple-demand (MD) system of the brain was the biological foundation for Gf, further characterization of their relationships in the context of aging is limited. The present study hypothesized that the structural metrics of the MD system, including cortical thickness, cortical volumes, and white matter (WM) tract integrity, was the brain correlates for Gf across the adult life span. Partial correlation analysis was performed to investigate whether the MD system could still explain Gf independent of the age effect. Moreover, the partial correlations between Gf and left/right structural metrics within the MD regions were compared to test whether the correlations displayed distinct lateralization. Methods The participants were recruited from the Cambridge Centre for Ageing and Neuroscience (Cam-CAN) databank, comprising the images of 603 healthy participants aged 18–88 years acquired on a 3-T system. The MRI data included high-resolution T1-weighted and diffusion-weighted images, from which gray matter and WM structural metrics of the MD system were analyzed, respectively. The structural metrics of gray matter were quantified in terms of cortical volume/thickness of five pairs of cortical regions, and those of WM were quantified in terms of the mean axial diffusivity (DA), radial diffusivity (DR), mean diffusivity (DM), and generalized fractional anisotropy (GFA) on five pairs of tracts. Partial correlation controlling for age and sex effects, was performed to investigate the associations of Gf scores with the mean DA, DR, DM and GFA of all tracts in the MD system, those of left and right hemispheric tracts, and those of each tract. Fisher’s exact test was used to compare the partial correlations between left and right MD regions. Results The linear relationship between cortical volumes and Gf was evident across all levels of the MD system even after controlling for age and sex. For the WM integrity, diffusion indices including DA, DR, DM and GFA displayed linear relationships with Gf scores at various levels of the MD system. Among the 10 WM tracts connecting the MD regions, bilateral superior longitudinal fasciculus I and bilateral frontal aslant tracts exhibited the strongest and significant associations. Our results did not show significant inter-hemispheric differences in the associations between structural metrics of the MD system and Gf. Conclusion Our results demonstrate significant associations between Gf and both cortical volumes and tract integrity of the MD system across the adult lifespan in a population-based cohort. We found that the association remained significant in the entire adult lifespan despite simultaneous decline of Gf and the MD system. Our results suggest that the MD system might be a structural underpinning of Gf and support the fronto-parietal model of cognitive aging. However, we did not find hemispheric differences in the Gf-MD correlations, not supporting the hemi-aging hypothesis.

NeurIPS Conference 2020 Conference Paper

Higher-Order Certification For Randomized Smoothing

  • Jeet Mohapatra
  • Ching-Yun Ko
  • Tsui-Wei Weng
  • Pin-Yu Chen
  • Sijia Liu
  • Luca Daniel

Randomized smoothing is a recently proposed defense against adversarial attacks that has achieved state-of-the-art provable robustness against $\ell_2$ perturbations. A number of works have extended the guarantees to other metrics, such as $\ell_1$ or $\ell_\infty$, by using different smoothing measures. Although the current framework has been shown to yield near-optimal $\ell_p$ radii, the total safety region certified by the current framework can be arbitrarily small compared to the optimal. In this work, we propose a framework to improve the certified safety region for these smoothed classifiers without changing the underlying smoothing scheme. The theoretical contributions are as follows: 1) We generalize the certification for randomized smoothing by reformulating certified radius calculation as a nested optimization problem over a class of functions. 2) We provide a method to calculate the certified safety region using zeroth-order and first-order information for Gaussian-smoothed classifiers. We also provide a framework that generalizes the calculation for certification using higher-order information. 3) We design efficient, high-confidence estimators for the relevant statistics of the first-order information. Combining the theoretical contribution 2) and 3) allows us to certify safety region that are significantly larger than ones provided by the current methods. On CIFAR and Imagenet, the new regions achieve significant improvements on general $\ell_1$ certified radii and on the $\ell_2$ certified radii for color-space attacks ($\ell_2$ perturbation restricted to only one color/channel) while also achieving smaller improvements on the general $\ell_2$ certified radii. As discussed in the future works section, our framework can also provide a way to circumvent the current impossibility results on achieving higher magnitudes of certified radii without requiring the use of data-dependent smoothing techniques.

ICML Conference 2020 Conference Paper

Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing

  • Sanghamitra Dutta
  • Dennis Wei
  • Hazar Yueksel
  • Pin-Yu Chen
  • Sijia Liu 0001
  • Kush R. Varshney

A trade-off between accuracy and fairness is almost taken as a given in the existing literature on fairness in machine learning. Yet, it is not preordained that accuracy should decrease with increased fairness. Novel to this work, we examine fair classification through the lens of mismatched hypothesis testing: trying to find a classifier that distinguishes between two ideal distributions when given two mismatched distributions that are biased. Using Chernoff information, a tool in information theory, we theoretically demonstrate that, contrary to popular belief, there always exist ideal distributions such that optimal fairness and accuracy (with respect to the ideal distributions) are achieved simultaneously: there is no trade-off. Moreover, the same classifier yields the lack of a trade-off with respect to ideal distributions while yielding a trade-off when accuracy is measured with respect to the given (possibly biased) dataset. To complement our main result, we formulate an optimization to find ideal distributions and derive fundamental limits to explain why a trade-off exists on the given biased dataset. We also derive conditions under which active data collection can alleviate the fairness-accuracy trade-off in the real world. Our results lead us to contend that it is problematic to measure accuracy with respect to data that reflects bias, and instead, we should be considering accuracy with respect to ideal, unbiased data.

NeurIPS Conference 2020 Conference Paper

Optimizing Mode Connectivity via Neuron Alignment

  • Norman Tatro
  • Pin-Yu Chen
  • Payel Das
  • Igor Melnyk
  • Prasanna Sattigeri
  • Rongjie Lai

The loss landscapes of deep neural networks are not well understood due to their high nonconvexity. Empirically, the local minima of these loss functions can be connected by a learned curve in model space, along which the loss remains nearly constant; a feature known as mode connectivity. Yet, current curve finding algorithms do not consider the influence of symmetry in the loss surface created by model weight permutations. We propose a more general framework to investigate the effect of symmetry on landscape connectivity by accounting for the weight permutations of the networks being connected. To approximate the optimal permutation, we introduce an inexpensive heuristic referred to as neuron alignment. Neuron alignment promotes similarity between the distribution of intermediate activations of a model along the curve with that of the endpoint models. We provide theoretical analysis establishing the benefit of alignment to mode connectivity based on this simple heuristic. We empirically verify that the permutation given by alignment is locally optimal via a proximal alternating minimization scheme. Empirically, optimizing the weight permutation is critical for efficiently learning a simple, planar, low-loss curve between networks that successfully generalizes. Our alignment method can significantly alleviate the recently identified robust loss barrier on the path connecting two adversarial robust models and find more robust and accurate models on the path.

ICML Conference 2020 Conference Paper

Proper Network Interpretability Helps Adversarial Robustness in Classification

  • Akhilan Boopathy
  • Sijia Liu 0001
  • Gaoyuan Zhang
  • Cynthia Liu
  • Pin-Yu Chen
  • Shiyu Chang
  • Luca Daniel

Recent works have empirically shown that there exist adversarial examples that can be hidden from neural network interpretability (namely, making network interpretation maps visually similar), or interpretability is itself susceptible to adversarial attacks. In this paper, we theoretically show that with a proper measurement of interpretation, it is actually difficult to prevent prediction-evasion adversarial attacks from causing interpretation discrepancy, as confirmed by experiments on MNIST, CIFAR-10 and Restricted ImageNet. Spurred by that, we develop an interpretability-aware defensive scheme built only on promoting robust interpretation (without the need for resorting to adversarial loss minimization). We show that our defense achieves both robust classification and robust interpretation, outperforming state-of-the-art adversarial training methods against attacks of large perturbation in particular.

AAAI Conference 2020 Conference Paper

Reinforcement-Learning Based Portfolio Management with Augmented Asset Movement Prediction States

  • Yunan Ye
  • Hengzhi Pei
  • Boxin Wang
  • Pin-Yu Chen
  • Yada Zhu
  • Ju Xiao
  • Bo Li

Portfolio management (PM) is a fundamental financial planning task that aims to achieve investment goals such as maximal profits or minimal risks. Its decision process involves continuous derivation of valuable information from various data sources and sequential decision optimization, which is a prospective research direction for reinforcement learning (RL). In this paper, we propose SARL, a novel State- Augmented RL framework for PM. Our framework aims to address two unique challenges in financial PM: (1) data heterogeneity – the collected information for each asset is usually diverse, noisy and imbalanced (e. g. , news articles); and (2) environment uncertainty – the financial market is versatile and non-stationary. To incorporate heterogeneous data and enhance robustness against environment uncertainty, our SARL augments the asset information with their price movement prediction as additional states, where the prediction can be solely based on financial data (e. g. , asset prices) or derived from alternative sources such as news. Experiments on two real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with 7-year Reuters news articles, validate the effectiveness of SARL over existing PM approaches, both in terms of accumulated profits and risk-adjusted profits. Moreover, extensive simulations are conducted to demonstrate the importance of our proposed state augmentation, providing new insights and boosting performance significantly over standard RL-based PM method and other baselines.

NeurIPS Conference 2020 Conference Paper

ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training

  • Chia-Yu Chen
  • Jiamin Ni
  • Songtao Lu
  • Xiaodong Cui
  • Pin-Yu Chen
  • Xiao Sun
  • Naigang Wang
  • Swagath Venkataramani

Large-scale distributed training of Deep Neural Networks (DNNs) on state-of-the-art platforms are expected to be severely communication constrained. To overcome this limitation, numerous gradient compression techniques have been proposed and have demonstrated high compression ratios. However, most existing compression methods do not scale well to large scale distributed systems (due to gradient build-up) and / or lack evaluations in large datasets. To mitigate these issues, we propose a new compression technique, Scalable Sparsified Gradient Compression (ScaleComp), that (i) leverages similarity in the gradient distribution amongst learners to provide a commutative compressor and keep communication cost constant to worker number and (ii) includes low-pass filter in local gradient accumulations to mitigate the impacts of large batch size training and significantly improve scalability. Using theoretical analysis, we show that ScaleComp provides favorable convergence guarantees and is compatible with gradient all-reduce techniques. Furthermore, we experimentally demonstrate that ScaleComp has small overheads, directly reduces gradient traffic and provides high compression rates (70-150X) and excellent scalability (up to 64-80 learners and 10X larger batch sizes over normal training) across a wide range of applications (image, language, and speech) without significant accuracy loss.

AAAI Conference 2020 Conference Paper

Seq2Sick: Evaluating the Robustness of Sequence-to-Sequence Models with Adversarial Examples

  • Minhao Cheng
  • Jinfeng Yi
  • Pin-Yu Chen
  • Huan Zhang
  • Cho-Jui Hsieh

Crafting adversarial examples has become an important technique to evaluate the robustness of deep neural networks (DNNs). However, most existing works focus on attacking the image classification problem since its input space is continuous and output space is finite. In this paper, we study the much more challenging problem of crafting adversarial examples for sequence-to-sequence (seq2seq) models, whose inputs are discrete text strings and outputs have an almost infinite number of possibilities. To address the challenges caused by the discrete input space, we propose a projected gradient method combined with group lasso and gradient regularization. To handle the almost infinite output space, we design some novel loss functions to conduct non-overlapping attack and targeted keyword attack. We apply our algorithm to machine translation and text summarization tasks, and verify the effectiveness of the proposed algorithm: by changing less than 3 words, we can make seq2seq model to produce desired outputs with high success rates. We also use an external sentiment classifier to verify the property of preserving semantic meanings for our generated adversarial examples. On the other hand, we recognize that, compared with the wellevaluated CNN-based classifiers, seq2seq models are intrinsically more robust to adversarial attacks.

ICLR Conference 2020 Conference Paper

Sign-OPT: A Query-Efficient Hard-label Adversarial Attack

  • Minhao Cheng
  • Simranjit Singh 0003
  • Patrick H. Chen
  • Pin-Yu Chen
  • Sijia Liu 0001
  • Cho-Jui Hsieh

We study the most practical problem setup for evaluating adversarial robustness of a machine learning system with limited access: the hard-label black-box attack setting for generating adversarial examples, where limited model queries are allowed and only the decision is provided to a queried data input. Several algorithms have been proposed for this problem but they typically require huge amount (>20,000) of queries for attacking one example. Among them, one of the state-of-the-art approaches (Cheng et al., 2019) showed that hard-label attack can be modeled as an optimization problem where the objective function can be evaluated by binary search with additional model queries, thereby a zeroth order optimization algorithm can be applied. In this paper, we adopt the same optimization formulation but propose to directly estimate the sign of gradient at any direction instead of the gradient itself, which enjoys the benefit of single query. Using this single query oracle for retrieving sign of directional derivative, we develop a novel query-efficient Sign-OPT approach for hard-label black-box attack. We provide a convergence analysis of the new algorithm and conduct experiments on several models on MNIST, CIFAR-10 and ImageNet. We find that Sign-OPT attack consistently requires 5X to 10X fewer queries when compared to the current state-of-the-art approaches, and usually converges to an adversarial example with smaller perturbation.

AAAI Conference 2020 Conference Paper

TemPEST: Soft Template-Based Personalized EDM Subject Generation through Collaborative Summarization

  • Yu-Hsiu Chen
  • Pin-Yu Chen
  • Hong-Han Shuai
  • Wen-Chih Peng

We address personalized Electronic Direct Mail (EDM) subject generation, which generates an attractive subject line for a product description according to user’s preference on different contents or writing styles. Generating personalized EDM subjects has a few notable differences from generating text summaries. The subject has to be not only faithful to the description itself but also attractive to increase the click-through rate. Moreover, different users may have different preferences over the styles of topics. We propose a novel personalized EDM subject generation model named Soft Template-based Personalized EDM Subject Generator (TemPEST) to consider the aforementioned users’ characteristics when generating subjects, which contains a soft template-based selective encoder network, a user rating encoder network, a summary decoder network and a rating decoder. Experimental results indicate that TemPEST is able to generate personalized topics and also effectively perform recommending rating reconstruction.

IJCAI Conference 2020 Conference Paper

Toward a neuro-inspired creative decoder

  • Payel Das
  • Brian Quanz
  • Pin-Yu Chen
  • Jae-wook Ahn
  • Dhruv Shah

Creativity, a process that generates novel and meaningful ideas, involves increased association between task-positive (control) and task-negative (default) networks in the human brain. Inspired by this seminal finding, in this study we propose a creative decoder within a deep generative framework, which involves direct modulation of the neuronal activation pattern after sampling from the learned latent space. The proposed approach is fully unsupervised and can be used off- the-shelf. Several novelty metrics and human evaluation were used to evaluate the creative capacity of the deep decoder. Our experiments on different image datasets (MNIST, FMNIST, MNIST+FMNIST, WikiArt and CelebA) reveal that atypical co-activation of highly activated and weakly activated neurons in a deep decoder promotes generation of novel and meaningful artifacts.

AAAI Conference 2020 Conference Paper

Towards Certificated Model Robustness Against Weight Perturbations

  • Tsui-Wei Weng
  • Pu Zhao
  • Sijia Liu
  • Pin-Yu Chen
  • Xue Lin
  • Luca Daniel

This work studies the sensitivity of neural networks to weight perturbations, firstly corresponding to a newly developed threat model that perturbs the neural network parameters. We propose an efficient approach to compute a certified robustness bound of weight perturbations, within which neural networks will not make erroneous outputs as desired by the adversary. In addition, we identify a useful connection between our developed certification method and the problem of weight quantization, a popular model compression technique in deep neural networks (DNNs) and a ‘must-try’ step in the design of DNN inference engines on resource constrained computing platforms, such as mobiles, FPGA, and ASIC. Specifically, we study the problem of weight quantization – weight perturbations in the non-adversarial setting – through the lens of certificated robustness, and we demonstrate significant improvements on the generalization ability of quantized networks through our robustness-aware quantization scheme.

AAAI Conference 2020 Conference Paper

Towards Query-Efficient Black-Box Adversary with Zeroth-Order Natural Gradient Descent

  • Pu Zhao
  • Pin-Yu Chen
  • Siyue Wang
  • Xue Lin

Despite the great achievements of the modern deep neural networks (DNNs), the vulnerability/robustness of state-ofthe-art DNNs raises security concerns in many application domains requiring high reliability. Various adversarial attacks are proposed to sabotage the learning performance of DNN models. Among those, the black-box adversarial attack methods have received special attentions owing to their practicality and simplicity. Black-box attacks usually prefer less queries in order to maintain stealthy and low costs. However, most of the current black-box attack methods adopt the first-order gradient descent method, which may come with certain deficiencies such as relatively slow convergence and high sensitivity to hyper-parameter settings. In this paper, we propose a zeroth-order natural gradient descent (ZO-NGD) method to design the adversarial attacks, which incorporates the zerothorder gradient estimation technique catering to the black-box attack scenario and the second-order natural gradient descent to achieve higher query efficiency. The empirical evaluations on image classification datasets demonstrate that ZO-NGD can obtain significantly lower model query complexities compared with state-of-the-art attack methods.

ICML Conference 2020 Conference Paper

Transfer Learning without Knowing: Reprogramming Black-box Machine Learning Models with Scarce Data and Limited Resources

  • Yun-Yun Tsai
  • Pin-Yu Chen
  • Tsung-Yi Ho

Current transfer learning methods are mainly based on finetuning a pretrained model with target-domain data. Motivated by the techniques from adversarial machine learning (ML) that are capable of manipulating the model prediction via data perturbations, in this paper we propose a novel approach, black-box adversarial reprogramming (BAR), that repurposes a well-trained black-box ML model (e. g. , a prediction API or a proprietary software) for solving different ML tasks, especially in the scenario with scarce data and constrained resources. The rationale lies in exploiting high-performance but unknown ML models to gain learning capability for transfer learning. Using zeroth order optimization and multi-label mapping techniques, BAR can reprogram a black-box ML model solely based on its input-output responses without knowing the model architecture or changing any parameter. More importantly, in the limited medical data setting, on autism spectrum disorder classification, diabetic retinopathy detection, and melanoma detection tasks, BAR outperforms state-of-the-art methods and yields comparable performance to the vanilla adversarial reprogramming method requiring complete knowledge of the target ML model. BAR also outperforms baseline transfer learning approaches by a significant margin, demonstrating cost-effective means and new insights for transfer learning.

AAAI Conference 2019 Conference Paper

AutoZOOM: Autoencoder-Based Zeroth Order Optimization Method for Attacking Black-Box Neural Networks

  • Chun-Chen Tu
  • Paishun Ting
  • Pin-Yu Chen
  • Sijia Liu
  • Huan Zhang
  • Jinfeng Yi
  • Cho-Jui Hsieh
  • Shin-Ming Cheng

Recent studies have shown that adversarial examples in stateof-the-art image classifiers trained by deep neural networks (DNN) can be easily generated when the target model is transparent to an attacker, known as the white-box setting. However, when attacking a deployed machine learning service, one can only acquire the input-output correspondences of the target model; this is the so-called black-box attack setting. The major drawback of existing black-box attacks is the need for excessive model queries, which may give a false sense of model robustness due to inefficient query designs. To bridge this gap, we propose a generic framework for query-efficient blackbox attacks. Our framework, AutoZOOM, which is short for Autoencoder-based Zeroth Order Optimization Method, has two novel building blocks towards efficient black-box attacks: (i) an adaptive random gradient estimation strategy to balance query counts and distortion, and (ii) an autoencoder that is either trained offline with unlabeled data or a bilinear resizing operation for attack acceleration. Experimental results suggest that, by applying AutoZOOM to a state-of-the-art black-box attack (ZOO), a significant reduction in model queries can be achieved without sacrificing the attack success rate and the visual quality of the resulting adversarial examples. In particular, when compared to the standard ZOO method, AutoZOOM can consistently reduce the mean query counts in finding successful adversarial examples (or reaching the same distortion level) by at least 93% on MNIST, CIFAR-10 and ImageNet datasets, leading to novel insights on adversarial robustness.

AAAI Conference 2019 Conference Paper

CNN-Cert: An Efficient Framework for Certifying Robustness of Convolutional Neural Networks

  • Akhilan Boopathy
  • Tsui-Wei Weng
  • Pin-Yu Chen
  • Sijia Liu
  • Luca Daniel

Verifying robustness of neural network classifiers has attracted great interests and attention due to the success of deep neural networks and their unexpected vulnerability to adversarial perturbations. Although finding minimum adversarial distortion of neural networks (with ReLU activations) has been shown to be an NP-complete problem, obtaining a non-trivial lower bound of minimum distortion as a provable robustness guarantee is possible. However, most previous works only focused on simple fully-connected layers (multilayer perceptrons) and were limited to ReLU activations. This motivates us to propose a general and efficient framework, CNN-Cert, that is capable of certifying robustness on general convolutional neural networks. Our framework is general – we can handle various architectures including convolutional layers, max-pooling layers, batch normalization layer, residual blocks, as well as general activation functions; our approach is efficient – by exploiting the special structure of convolutional layers, we achieve up to 17 and 11 times of speed-up compared to the state-of-the-art certification algorithms (e. g. Fast-Lin, CROWN) and 366 times of speed-up compared to the dual-LP approach while our algorithm obtains similar or even better verification bounds. In addition, CNN-Cert generalizes state-of-the-art algorithms e. g. Fast-Lin and CROWN. We demonstrate by extensive experiments that our method outperforms state-of-the-art lowerbound-based certification algorithms in terms of both bound quality and speed.

ICML Conference 2019 Conference Paper

Fast Incremental von Neumann Graph Entropy Computation: Theory, Algorithm, and Applications

  • Pin-Yu Chen
  • Lingfei Wu 0001
  • Sijia Liu 0001
  • Indika Rajapakse

The von Neumann graph entropy (VNGE) facilitates measurement of information divergence and distance between graphs in a graph sequence. It has been successfully applied to various learning tasks driven by network-based data. While effective, VNGE is computationally demanding as it requires the full eigenspectrum of the graph Laplacian matrix. In this paper, we propose a new computational framework, Fast Incremental von Neumann Graph EntRopy (FINGER), which approaches VNGE with a performance guarantee. FINGER reduces the cubic complexity of VNGE to linear complexity in the number of nodes and edges, and thus enables online computation based on incremental graph changes. We also show asymptotic equivalence of FINGER to the exact VNGE, and derive its approximation error bounds. Based on FINGER, we propose efficient algorithms for computing Jensen-Shannon distance between graphs. Our experimental results on different random graph models demonstrate the computational efficiency and the asymptotic equivalence of FINGER. In addition, we apply FINGER to two real-world applications and one synthesized anomaly detection dataset, and corroborate its superior performance over seven baseline graph similarity methods.

IJCAI Conference 2019 Conference Paper

Protecting Neural Networks with Hierarchical Random Switching: Towards Better Robustness-Accuracy Trade-off for Stochastic Defenses

  • Xiao Wang
  • Siyue Wang
  • Pin-Yu Chen
  • Yanzhi Wang
  • Brian Kulis
  • Xue Lin
  • Sang Chin

Despite achieving remarkable success in various domains, recent studies have uncovered the vulnerability of deep neural networks to adversarial perturbations, creating concerns on model generalizability and new threats such as prediction-evasive misclassification or stealthy reprogramming. Among different defense proposals, stochastic network defenses such as random neuron activation pruning or random perturbation to layer inputs are shown to be promising for attack mitigation. However, one critical drawback of current defenses is that the robustness enhancement is at the cost of noticeable performance degradation on legitimate data, e. g. , large drop in test accuracy. This paper is motivated by pursuing for a better trade-off between adversarial robustness and test accuracy for stochastic network defenses. We propose Defense Efficiency Score (DES), a comprehensive metric that measures the gain in unsuccessful attack attempts at the cost of drop in test accuracy of any defense. To achieve a better DES, we propose hierarchical random switching (HRS), which protects neural networks through a novel randomization scheme. A HRS-protected model contains several blocks of randomly switching channels to prevent adversaries from exploiting fixed model structures and parameters for their malicious purposes. Extensive experiments show that HRS is superior in defending against state-of-the-art white-box and adaptive adversarial misclassification attacks. We also demonstrate the effectiveness of HRS in defending adversarial reprogramming, which is the first defense against adversarial programs. Moreover, in most settings the average DES of HRS is at least 5X higher than current stochastic network defenses, validating its significantly improved robustness-accuracy trade-off.

ICML Conference 2019 Conference Paper

PROVEN: Verifying Robustness of Neural Networks with a Probabilistic Approach

  • Lily Weng
  • Pin-Yu Chen
  • Lam M. Nguyen
  • Mark S. Squillante
  • Akhilan Boopathy
  • Ivan V. Oseledets
  • Luca Daniel

We propose a novel framework PROVEN to \textbf{PRO}babilistically \textbf{VE}rify \textbf{N}eural network’s robustness with statistical guarantees. PROVEN provides probability certificates of neural network robustness when the input perturbation follow distributional characterization. Notably, PROVEN is derived from current state-of-the-art worst-case neural network robustness verification frameworks, and therefore it can provide probability certificates with little computational overhead on top of existing methods such as Fast-Lin, CROWN and CNN-Cert. Experiments on small and large MNIST and CIFAR neural network models demonstrate our probabilistic approach can tighten up robustness certificate to around $1. 8 \times$ and $3. 5 \times$ with at least a $99. 99%$ confidence compared with the worst-case robustness certificate by CROWN and CNN-Cert.

IJCAI Conference 2019 Conference Paper

Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective

  • Kaidi Xu
  • Hongge Chen
  • Sijia Liu
  • Pin-Yu Chen
  • Tsui-Wei Weng
  • Mingyi Hong
  • Xue Lin

Graph neural networks (GNNs) which apply the deep neural networks to graph data have achieved significant performance for the task of semi-supervised node classification. However, only few work has addressed the adversarial robustness of GNNs. In this paper, we first present a novel gradient-based attack method that facilitates the difficulty of tackling discrete graph data. When comparing to current adversarial attacks on GNNs, the results show that by only perturbing a small number of edge perturbations, including addition and deletion, our optimization-based attack can lead to a noticeable decrease in classification performance. Moreover, leveraging our gradient-based attack, we propose the first optimization-based adversarial training for GNNs. Our method yields higher robustness against both different gradient based and greedy attack methods without sacrifice classification accuracy on original graph.

AAAI Conference 2018 Conference Paper

EAD: Elastic-Net Attacks to Deep Neural Networks via Adversarial Examples

  • Pin-Yu Chen
  • Yash Sharma
  • Huan Zhang
  • Jinfeng Yi
  • Cho-Jui Hsieh

Recent studies have highlighted the vulnerability of deep neural networks (DNNs) to adversarial examples - a visually indistinguishable adversarial image can easily be crafted to cause a well-trained model to misclassify. Existing methods for crafting adversarial examples are based on L2 and L∞ distortion metrics. However, despite the fact that L1 distortion accounts for the total variation and encourages sparsity in the perturbation, little has been developed for crafting L1-based adversarial examples. In this paper, we formulate the process of attacking DNNs via adversarial examples as an elastic-net regularized optimization problem. Our elastic-net attacks to DNNs (EAD) feature L1oriented adversarial examples and include the state-of-the-art L2 attack as a special case. Experimental results on MNIST, CIFAR10 and ImageNet show that EAD can yield a distinct set of adversarial examples with small L1 distortion and attains similar attack performance to the state-of-the-art methods in different attack scenarios. More importantly, EAD leads to improved attack transferability and complements adversarial training for DNNs, suggesting novel insights on leveraging L1 distortion in adversarial machine learning and security implications of DNNs.

NeurIPS Conference 2018 Conference Paper

Efficient Neural Network Robustness Certification with General Activation Functions

  • Huan Zhang
  • Tsui-Wei Weng
  • Pin-Yu Chen
  • Cho-Jui Hsieh
  • Luca Daniel

Finding minimum distortion of adversarial examples and thus certifying robustness in neural networks classifiers is known to be a challenging problem. Nevertheless, recently it has been shown to be possible to give a non-trivial certified lower bound of minimum distortion, and some recent progress has been made towards this direction by exploiting the piece-wise linear nature of ReLU activations. However, a generic robustness certification for \textit{general} activation functions still remains largely unexplored. To address this issue, in this paper we introduce CROWN, a general framework to certify robustness of neural networks with general activation functions. The novelty in our algorithm consists of bounding a given activation function with linear and quadratic functions, hence allowing it to tackle general activation functions including but not limited to the four popular choices: ReLU, tanh, sigmoid and arctan. In addition, we facilitate the search for a tighter certified lower bound by \textit{adaptively} selecting appropriate surrogates for each neuron activation. Experimental results show that CROWN on ReLU networks can notably improve the certified lower bounds compared to the current state-of-the-art algorithm Fast-Lin, while having comparable computational efficiency. Furthermore, CROWN also demonstrates its effectiveness and flexibility on networks with general activation functions, including tanh, sigmoid and arctan.

NeurIPS Conference 2018 Conference Paper

Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives

  • Amit Dhurandhar
  • Pin-Yu Chen
  • Ronny Luss
  • Chun-Chen Tu
  • Paishun Ting
  • Karthikeyan Shanmugam
  • Payel Das

In this paper we propose a novel method that provides contrastive explanations justifying the classification of an input by a black box classifier such as a deep neural network. Given an input we find what should be minimally and sufficiently present (viz. important object pixels in an image) to justify its classification and analogously what should be minimally and necessarily \emph{absent} (viz. certain background pixels). We argue that such explanations are natural for humans and are used commonly in domains such as health care and criminology. What is minimally but critically \emph{absent} is an important part of an explanation, which to the best of our knowledge, has not been explicitly identified by current explanation methods that explain predictions of neural networks. We validate our approach on three real datasets obtained from diverse domains; namely, a handwritten digits dataset MNIST, a large procurement fraud dataset and a brain activity strength dataset. In all three cases, we witness the power of our approach in generating precise explanations that are also easy for human experts to understand and evaluate.

NeurIPS Conference 2018 Conference Paper

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

  • Sijia Liu
  • Bhavya Kailkhura
  • Pin-Yu Chen
  • Paishun Ting
  • Shiyu Chang
  • Lisa Amini

As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations). Our extensive experimental results show that our approaches outperform other state-of-the-art ZO algorithms, and strike a balance between the convergence rate and the function query complexity.