Arrow Research search

Author name cluster

Kay Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

ICLR Conference 2025 Conference Paper

BANGS: Game-theoretic Node Selection for Graph Self-Training

  • Fangxin Wang 0003
  • Kay Liu
  • Sourav Medya
  • Philip S. Yu

Graph self-training is a semi-supervised learning method that iteratively selects a set of unlabeled data to retrain the underlying graph neural network (GNN) model and improve its prediction performance. While selecting highly confident nodes has proven effective for self-training, this pseudo-labeling strategy ignores the combinatorial dependencies between nodes and suffers from a local view of the distribution. To overcome these issues, we propose BANGS, a novel framework that unifies the labeling strategy with conditional mutual information as the objective of node selection. Our approach---grounded in game theory---selects nodes in a combinatorial fashion and provides theoretical guarantees for robustness under noisy objective. More specifically, unlike traditional methods that rank and select nodes independently, BANGS considers nodes as a collective set in the self-training process. Our method demonstrates superior performance and robustness across various datasets, base models, and hyperparameter settings, outperforming existing techniques. The codebase is available on https://github.com/fangxin-wang/BANGS.

TMLR Journal 2025 Journal Article

Enhancing Fairness in Unsupervised Graph Anomaly Detection through Disentanglement

  • Wenjing Chang
  • Kay Liu
  • Philip S. Yu
  • Jianjun Yu

Graph anomaly detection (GAD) is becoming increasingly crucial in various applications, ranging from financial fraud detection to fake news detection. However, current GAD methods largely overlook the fairness problem, which might result in discriminatory decisions skewed toward certain demographic groups defined on sensitive attributes (e.g., gender). This greatly limits the applicability of these methods in real-world scenarios in light of societal and ethical restrictions. To address this critical gap, we make the first attempt to integrate fairness with utility in GAD decision-making. Specifically, we devise a novel DisEntangle-based FairnEss-aware aNomaly Detection framework on the attributed graph, named DEFEND. DEFEND first introduces disentanglement in GNNs to capture informative yet sensitive-irrelevant node representations, effectively reducing bias inherent in graphrepresentation learning. Besides, to alleviate discriminatory bias in evaluating anomalies, DEFEND adopts a reconstruction-based method, which concentrates solely on node attributes and avoids incorporating biased graph topology. Additionally, given the inherent association between sensitive-relevant and -irrelevant attributes, DEFEND further constrains the correlation between the reconstruction error and predicted sensitive attributes. Empirical evaluations on real-world datasets reveal that DEFEND performs effectively in GAD and significantly enhances fairness compared to state-of-the-art baselines. Our code is available at https://github.com/AhaChang/DEFEND.

TMLR Journal 2025 Journal Article

LEGO-Learn: Label-Efficient Graph Open-Set Learning

  • Haoyan Xu
  • Kay Liu
  • Zhengtao Yao
  • Philip S. Yu
  • Mengyuan Li
  • Kaize Ding
  • Yue Zhao

How can we train graph-based models to recognize unseen classes while keeping labeling costs low? Graph open-set learning (GOL) and out-of-distribution (OOD) detection aim to address this challenge by training models that can accurately classify known, in-distribution (ID) classes while identifying and handling previously unseen classes during inference. It is critical for high-stakes, real-world applications where models frequently encounter unexpected data, including finance, security, and healthcare. However, current GOL methods assume access to a large number of labeled ID samples, which is unrealistic for large-scale graphs due to high annotation costs. In this paper, we propose LEGO-Learn (Label-Efficient Graph Open-set Learning), a novel framework that addresses open-set node classification on graphs within a given label budget by selecting the most informative ID nodes. LEGO-Learn employs a GNN-based filter to identify and exclude potential OOD nodes and then selects highly informative ID nodes for labeling using the K-Medoids algorithm. To prevent the filter from discarding valuable ID examples, we introduce a classifier that differentiates between the $C$ known ID classes and an additional class representing OOD nodes (hence, a $C+1$ classifier). This classifier utilizes a weighted cross-entropy loss to balance the removal of OOD nodes while retaining informative ID nodes. Experimental results on four real-world datasets demonstrate that LEGO-Learn significantly outperforms leading methods, achieving up to a $6.62\%$ improvement in ID classification accuracy and a $7.49\%$ increase in AUROC for OOD detection.

JMLR Journal 2024 Journal Article

PyGOD: A Python Library for Graph Outlier Detection

  • Kay Liu
  • Yingtong Dou
  • Xueying Ding
  • Xiyang Hu
  • Ruitong Zhang
  • Hao Peng
  • Lichao Sun
  • Philip S. Yu

PyGOD is an open-source Python library for detecting outliers in graph data. As the first comprehensive library of its kind, PyGOD supports a wide array of leading graph-based methods for outlier detection under an easy-to-use, well-documented API designed for use by both researchers and practitioners. PyGOD provides modularized components of the different detectors implemented so that users can easily customize each detector for their purposes. To ease the construction of detection workflows, PyGOD offers numerous commonly used utility functions. To scale computation to large graphs, PyGOD supports functionalities for deep models such as sampling and mini-batch processing. PyGOD uses best practices in fostering code reliability and maintainability, including unit testing, continuous integration, and code coverage. To facilitate accessibility, PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI). [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

TMLR Journal 2024 Journal Article

Uncertainty in Graph Neural Networks: A Survey

  • Fangxin Wang
  • Yuqing Liu
  • Kay Liu
  • Yibo Wang
  • Sourav Medya
  • Philip S. Yu

Graph Neural Networks (GNNs) have been extensively used in various real-world applications. However, the predictive uncertainty of GNNs stemming from diverse sources such as inherent randomness in data and model training errors can lead to unstable and erroneous predictions. Therefore, identifying, quantifying, and utilizing uncertainty are essential to enhance the performance of the model for the downstream tasks as well as the reliability of the GNN predictions. This survey aims to provide a comprehensive overview of the GNNs from the perspective of uncertainty with an emphasis on its integration in graph learning. We compare and summarize existing graph uncertainty theory and methods, alongside the corresponding downstream tasks. Thereby, we bridge the gap between theory and practice, meanwhile connecting different GNN communities. Moreover, our work provides valuable insights into promising directions in this field.

NeurIPS Conference 2023 Conference Paper

Equal Opportunity of Coverage in Fair Regression

  • Fangxin Wang
  • Lu Cheng
  • Ruocheng Guo
  • Kay Liu
  • Philip S Yu

We study fair machine learning (ML) under predictive uncertainty to enable reliable and trustworthy decision-making. The seminal work of 'equalized coverage' proposed an uncertainty-aware fairness notion. However, it does not guarantee equal coverage rates across more fine-grained groups (e. g. , low-income females) conditioning on the true label and is biased in the assessment of uncertainty. To tackle these limitations, we propose a new uncertainty-aware fairness -- Equal Opportunity of Coverage (EOC) -- that aims to achieve two properties: (1) coverage rates for different groups with similar outcomes are close, and (2) the coverage rate for the entire population remains at a predetermined level. Further, the prediction intervals should be narrow to be informative. We propose Binned Fair Quantile Regression (BFQR), a distribution-free post-processing method to improve EOC with reasonable width for any trained ML models. It first calibrates a hold-out set to bound deviation from EOC, then leverages conformal prediction to maintain EOC on a test set, meanwhile optimizing prediction interval width. Experimental results demonstrate the effectiveness of our method in improving EOC.

NeurIPS Conference 2022 Conference Paper

BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs

  • Kay Liu
  • Yingtong Dou
  • Yue Zhao
  • Xueying Ding
  • Xiyang Hu
  • Ruitong Zhang
  • Kaize Ding
  • Canyu Chen

Detecting which nodes in graphs are outliers is a relatively new machine learning task with numerous applications. Despite the proliferation of algorithms developed in recent years for this task, there has been no standard comprehensive setting for performance evaluation. Consequently, it has been difficult to understand which methods work well and when under a broad range of settings. To bridge this gap, we present—to the best of our knowledge—the first comprehensive benchmark for unsupervised outlier node detection on static attributed graphs called BOND, with the following highlights. (1) We benchmark the outlier detection performance of 14 methods ranging from classical matrix factorization to the latest graph neural networks. (2) Using nine real datasets, our benchmark assesses how the different detection methods respond to two major types of synthetic outliers and separately to “organic” (real non-synthetic) outliers. (3) Using an existing random graph generation technique, we produce a family of synthetically generated datasets of different graph sizes that enable us to compare the running time and memory usage of the different outlier detection algorithms. Based on our experimental results, we discuss the pros and cons of existing graph outlier detection algorithms, and we highlight opportunities for future research. Importantly, our code is freely available and meant to be easily extendable: https: //github. com/pygod-team/pygod/tree/main/benchmark