Author name cluster

Ninghao Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

14 papers

1 author row

NeurIPS Conference 2025 Conference Paper

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification

Lin Zhang
Wenshuo Dong
Zhuoran Zhang
Shu Yang
Lijie Hu
Ninghao Liu
Pan Zhou
Di Wang

Understanding the internal mechanisms of transformer-based language models remains challenging. Mechanistic interpretability based on circuit discovery aims to reverse engineer neural networks by analyzing their internal processes at the level of computational subgraphs. In this paper, we revisit existing gradient-based circuit identification methods and find that their performance is either affected by the zero-gradient problem or saturation effects, where edge attribution scores become insensitive to input changes, resulting in noisy and unreliable attribution evaluations for circuit components. To address the saturation effect, we propose Edge Attribution Patching with GradPath (EAP-GP), EAP-GP introduces an integration path, starting from the input and adaptively following the direction of the difference between the gradients of corrupted and clean inputs to avoid the saturated region. This approach enhances attribution reliability and improves the faithfulness of circuit identification. We evaluate EAP-GP on 6 datasets using GPT-2 Small, GPT-2 Medium, and GPT-2 XL. Experimental results demonstrate that EAP-GP outperforms existing methods in circuit faithfulness, achieving improvements up to 17. 7\%. Comparisons with manually annotated ground-truth circuits demonstrate that EAP-GP achieves precision and recall comparable to or better than previous approaches, highlighting its effectiveness in identifying accurate circuits.

PDF Details

TMLR Journal 2025 Journal Article

Global Graph Counterfactual Explanation: A Subgraph Mapping Approach

Yinhan He
Wendy Zheng
Yaochen Zhu
Jing Ma
Saumitra Mishra
Natraj Raman
Ninghao Liu
Jundong Li

Graph Neural Networks (GNNs) have been widely deployed in various real-world applications. However, most GNNs are black-box models that lack explanations. One strategy to explain GNNs is through counterfactual explanation, which aims to find minimum perturbations on input graphs that change the GNN predictions. Existing works on GNN counterfactual explanations primarily concentrate on the local-level perspective (i.e., generating counterfactuals for each individual graph), which suffers from information overload and lacks insights into the broader cross-graph relationships. To address such issues, we propose GlobalGCE, a novel global-level graph counterfactual explanation method. GlobalGCE aims to identify a collection of subgraph mapping rules as counterfactual explanations for the target GNN. According to these rules, substituting certain significant subgraphs with their counterfactual subgraphs will change the GNN prediction to the desired class for most graphs (i.e., maximum coverage). Methodologically, we design a significant subgraph generator and a counterfactual subgraph autoencoder in our GlobalGCE, where the subgraphs and the rules can be effectively generated. Extensive experiments demonstrate the superiority of our GlobalGCE compared to existing baselines. Our code can be found at \url{https://github.com/YinhanHe123/GlobalGCE}.

PDF Details

AAAI Conference 2025 Conference Paper

Language Ranker: A Metric for Quantifying LLM Performance Across High and Low-Resource Languages

Zihao Li
Yucheng Shi
Zirui Liu
Fan Yang
Ali Payani
Ninghao Liu
Mengnan Du

The development of Large Language Models (LLMs) relies on extensive text corpora, which are often unevenly distributed across languages. This imbalance results in LLMs performing significantly better on high-resource languages like English, German, and French, while their capabilities in low-resource languages remain inadequate. Currently, there is a lack of quantitative methods to evaluate the performance of LLMs in these low-resource languages. To address this gap, we propose the Language Ranker, an intrinsic metric designed to benchmark and rank languages based on LLM performance using internal representations. By comparing the LLM's internal representation of various languages against a baseline derived from English, we can assess the model's multilingual capabilities in a robust and language-agnostic manner. Our analysis reveals that high-resource languages exhibit higher similarity scores with English, demonstrating superior performance, while low-resource languages show lower similarity scores, underscoring the effectiveness of our metric in assessing language-specific capabilities. Besides, the experiments show that there is a strong correlation between the LLM’s performance in different languages and the proportion of those languages in its pre-training corpus. These insights underscore the efficacy of the Language Ranker as a tool for evaluating LLM performance across different languages, particularly those with limited resources.

PDF Details DOI

AAAI Conference 2024 Short Paper

Automated Natural Language Explanation of Deep Visual Neurons with Large Models (Student Abstract)

Chenxu Zhao
Wei Qian
Yucheng Shi
Mengdi Huai
Ninghao Liu

Interpreting deep neural networks through examining neurons offers distinct advantages when it comes to exploring the inner workings of Deep Neural Networks. Previous research has indicated that specific neurons within deep vision networks possess semantic meaning and play pivotal roles in model performance. Nonetheless, the current methods for generating neuron semantics heavily rely on human intervention, which hampers their scalability and applicability. To address this limitation, this paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models, without requiring human intervention or prior knowledge. Experiments are conducted with both qualitative and quantitative analysis to verify the effectiveness of our proposed approach.

PDF Details DOI

AAAI Conference 2024 Short Paper

BadSAM: Exploring Security Vulnerabilities of SAM via Backdoor Attacks (Student Abstract)

Zihan Guan
Mengxuan Hu
Zhongliang Zhou
Jielu Zhang
Sheng Li
Ninghao Liu

Image segmentation is foundational to computer vision applications, and the Segment Anything Model (SAM) has become a leading base model for these tasks. However, SAM falters in specialized downstream challenges, leading to various customized SAM models. We introduce BadSAM, a backdoor attack tailored for SAM, revealing that customized models can harbor malicious behaviors. Using the CAMO dataset, we confirm BadSAM's efficacy and identify SAM vulnerabilities. This study paves the way for the development of more secure and customizable vision foundation models.

PDF Details DOI

TIST Journal 2024 Journal Article

DIRECT: Dual Interpretable Recommendation with Multi-aspect Word Attribution

Xuansheng Wu
Hanqin Wan
Qiaoyu Tan
Wenlin Yao
Ninghao Liu

Recommending products to users with intuitive explanations helps improve the system in transparency, persuasiveness, and satisfaction. Existing interpretation techniques include post hoc methods and interpretable modeling. The former category could quantitatively analyze input contribution to model prediction but has limited interpretation faithfulness, while the latter could explain model internal mechanisms but may not directly attribute model predictions to input features. In this study, we propose a novel Dual Interpretable Recommendation model called DIRECT, which integrates ideas of the two interpretation categories to inherit their advantages and avoid limitations. Specifically, DIRECT makes use of item descriptions as explainable evidence for recommendation. First, similar to the post hoc interpretation, DIRECT could attribute the prediction of a user preference score to textual words of the item descriptions. The attribution of each word is related to its sentiment polarity and word importance, where a word is important if it corresponds to an item aspect that the user is interested in. Second, to improve the interpretability of embedding space, we propose to extract high-level concepts from embeddings, where each concept corresponds to an item aspect. To learn discriminative concepts, we employ a concept bottleneck layer and maximize the coding rate reduction on word-aspect embeddings by leveraging a word–word affinity graph extracted from a pre-trained language model. In this way, DIRECT simultaneously achieves faithful attribution and usable interpretation of embedding space. We also show that DIRECT achieves linear inference time complexity regarding the length of item reviews. We conduct experiments including ablation studies on five real-world datasets. Quantitative analysis, visualizations, and case studies verify the interpretability of DIRECT. Our code is available at: https://github.com/JacksonWuxs/DIRECT.

Details DOI

TIST Journal 2024 Journal Article

Explainability for Large Language Models: A Survey

Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their internal mechanisms are still unclear and this lack of transparency poses unwanted risks for downstream applications. Therefore, understanding and explaining these models is crucial for elucidating their behaviors, limitations, and social impacts. In this article, we introduce a taxonomy of explainability techniques and provide a structured overview of methods for explaining Transformer-based language models. We categorize techniques based on the training paradigms of LLMs: traditional fine-tuning-based paradigm and prompting-based paradigm. For each paradigm, we summarize the goals and dominant approaches for generating local explanations of individual predictions and global explanations of overall model knowledge. We also discuss metrics for evaluating generated explanations and discuss how explanations can be leveraged to debug models and improve performance. Lastly, we examine key challenges and emerging opportunities for explanation techniques in the era of LLMs in comparison to conventional deep learning models.

Details DOI

NeurIPS Conference 2023 Conference Paper

Black-box Backdoor Defense via Zero-shot Image Purification

Yucheng Shi
Mengnan Du
Xuansheng Wu
Zihan Guan
Jin Sun
Ninghao Liu

Backdoor attacks inject poisoned samples into the training data, resulting in the misclassification of the poisoned input during a model's deployment. Defending against such attacks is challenging, especially for real-world black-box models where only query access is permitted. In this paper, we propose a novel defense framework against backdoor attacks through Zero-shot Image Purification (ZIP). Our framework can be applied to poisoned models without requiring internal information about the model or any prior knowledge of the clean/poisoned samples. Our defense framework involves two steps. First, we apply a linear transformation (e. g. , blurring) on the poisoned image to destroy the backdoor pattern. Then, we use a pre-trained diffusion model to recover the missing semantic information removed by the transformation. In particular, we design a new reverse process by using the transformed image to guide the generation of high-fidelity purified images, which works in zero-shot settings. We evaluate our ZIP framework on multiple datasets with different types of attacks. Experimental results demonstrate the superiority of our ZIP framework compared to state-of-the-art backdoor defense baselines. We believe that our results will provide valuable insights for future defense methods for black-box models. Our code is available at https: //github. com/sycny/ZIP.

PDF Details

AAAI Conference 2023 Conference Paper

Interpreting Unfairness in Graph Neural Networks via Training Node Attribution

Yushun Dong
Song Wang
Jing Ma
Ninghao Liu
Jundong Li

Graph Neural Networks (GNNs) have emerged as the leading paradigm for solving graph analytical problems in various real-world applications. Nevertheless, GNNs could potentially render biased predictions towards certain demographic subgroups. Understanding how the bias in predictions arises is critical, as it guides the design of GNN debiasing mechanisms. However, most existing works overwhelmingly focus on GNN debiasing, but fall short on explaining how such bias is induced. In this paper, we study a novel problem of interpreting GNN unfairness through attributing it to the influence of training nodes. Specifically, we propose a novel strategy named Probabilistic Distribution Disparity (PDD) to measure the bias exhibited in GNNs, and develop an algorithm to efficiently estimate the influence of each training node on such bias. We verify the validity of PDD and the effectiveness of influence estimation through experiments on real-world datasets. Finally, we also demonstrate how the proposed framework could be used for debiasing GNNs. Open-source code can be found at https://github.com/yushundong/BIND.

PDF Details DOI

AAAI Conference 2023 Conference Paper

SEAT: Stable and Explainable Attention

Lijie Hu
Yixin Liu
Ninghao Liu
Mengdi Huai
Lichao Sun
Di Wang

Attention mechanism has become a standard fixture in many state-of-the-art natural language processing (NLP) models, not only due to its outstanding performance, but also because it provides plausible innate explanations for neural architectures. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embeddings, which impedes it from being a faithful explanation tool. Thus, a natural question is whether we can find an alternative to vanilla attention, which is more stable and could keep the key characteristics of the explanation. In this paper, we provide a rigorous definition of such an attention method named SEAT (Stable and Explainable ATtention). Specifically, SEAT has the following three properties: (1) Its prediction distribution is close to the prediction of the vanilla attention; (2) Its top-k indices largely overlap with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the attention and prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Furthermore, we propose an optimization method for obtaining SEAT, which could be considered as revising the vanilla attention. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures, with different evaluation metrics on model interpretation, stability and accuracy. Results show that, besides preserving the original explainability and model performance, SEAT is more stable against input perturbations and training randomness, which indicates it is a more faithful explanation.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Dynamic Memory based Attention Network for Sequential Recommendation

Qiaoyu Tan
Jianwei Zhang
Ninghao Liu
Xiao Huang
Hongxia Yang
Jingren Zhou
Xia Hu

Sequential recommendation has become increasingly essential in various online services. It aims to model the dynamic preferences of users from their historical interactions and predict their next items. The accumulated user behavior records on real systems could be very long. This rich data brings opportunities to track actual interests of users. Prior efforts mainly focus on making recommendations based on relatively recent behaviors. However, the overall sequential data may not be effectively utilized, as early interactions might affect users’ current choices. Also, it has become intolerable to scan the entire behavior sequence when performing inference for each user, since real-world system requires short response time. To bridge the gap, we propose a novel long sequential recommendation model, called Dynamic Memory-based Attention Network (DMAN). It segments the overall long behavior sequence into a series of sub-sequences, then trains the model and maintains a set of memory blocks to preserve longterm interests of users. To improve memory fidelity, DMAN dynamically abstracts each user’s long-term interest into its own memory blocks by minimizing an auxiliary reconstruction loss. Based on the dynamic memory, the user’s shortterm and long-term interests can be explicitly extracted and combined for efficient joint recommendation. Empirical results over four benchmark datasets demonstrate the superiority of our model in capturing long-term dependency over various state-of-the-art sequential models.

PDF Details

NeurIPS Conference 2020 Conference Paper

Learning sparse codes from compressed representations with biologically plausible local wiring constraints

Kion Fallah
Adam Willats
Ninghao Liu
Christopher Rozell

Sparse coding is an important method for unsupervised learning of task-independent features in theoretical neuroscience models of neural coding. While a number of algorithms exist to learn these representations from the statistics of a dataset, they largely ignore the information bottlenecks present in fiber pathways connecting cortical areas. For example, the visual pathway has many fewer neurons transmitting visual information to cortex than the number of photoreceptors. Both empirical and analytic results have recently shown that sparse representations can be learned effectively after performing dimensionality reduction with randomized linear operators, producing latent coefficients that preserve information. Unfortunately, current proposals for sparse coding in the compressed space require a centralized compression process (i. e. , dense random matrix) that is biologically unrealistic due to local wiring constraints observed in neural circuits. The main contribution of this paper is to leverage recent results on structured random matrices to propose a theoretical neuroscience model of randomized projections for communication between cortical areas that is consistent with the local wiring constraints observed in neuroanatomy. We show analytically and empirically that unsupervised learning of sparse representations can be performed in the compressed space despite significant local wiring constraints in compression matrices of varying forms (corresponding to different local wiring patterns). Our analysis verifies that even with significant local wiring constraints, the learned representations remain qualitatively similar, have similar quantitative performance in both training and generalization error, and are consistent across many measures with measured macaque V1 receptive fields.

PDF Details

IJCAI Conference 2018 Conference Paper

Contextual Outlier Interpretation

Ninghao Liu
Donghwa Shin
Xia Hu

While outlier detection has been intensively studied in many applications, interpretation is becoming increasingly important to help people trust and evaluate the developed detection models through providing intrinsic reasons why the given outliers are identified. It is a nontrivial task for interpreting the abnormality of outliers due to the distinct characteristics of different detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, contexts where outliers locate, as well as the relation between outliers and the contexts, are usually overlooked in existing interpretation frameworks. To tackle the issues, in this paper, we propose a Contextual Outlier INterpretation (COIN) framework to explain the abnormality of outliers spotted by detectors. The interpretability of an outlier is achieved through three aspects, i. e. , outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework.

PDF Details

IJCAI Conference 2017 Conference Paper

Accelerated Local Anomaly Detection via Resolving Attributed Networks

Ninghao Liu
Xiao Huang
Xia Hu

Attributed networks, in which network connectivity and node attributes are available, have been increasingly used to model real-world information systems, such as social media and e-commerce platforms. While outlier detection has been extensively studied to identify anomalies that deviate from certain chosen background, existing algorithms cannot be directly applied on attributed networks due to the heterogeneous types of information and the scale of real-world data. Meanwhile, it has been observed that local anomalies, which may align with global condition, are hard to be detected by existing algorithms with interpretability. Motivated by the observations, in this paper, we propose to study the problem of effective and efficient local anomaly detection in attributed networks. In particular, we design a collective way for modeling heterogeneous network and attribute information, and develop a novel and efficient distributed optimization algorithm to handle large-scale data. In the experiments, we compare the proposed framework with the state-of-the-art methods on both real and synthetic datasets, and demonstrate its effectiveness and efficiency through quantitative evaluation and case studies.

PDF Details