Arrow Research search

Author name cluster

Bryan Hooi

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

56 papers
2 author rows

Possible papers

56

AAAI Conference 2026 Conference Paper

DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs

  • Yuan Li
  • Jun Hu
  • Bryan Hooi
  • Bingsheng He
  • Cheng Chen

Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features—often rich in textual data—and graph structural information. Recently, Graph-Enhanced LLMs have emerged as a promising graph learning approach that converts graph information into prompts, exploiting LLMs' ability to reason over both textual and structural information. Among them, text-only prompting, which converts graph information into prompts consisting solely of text tokens, offers a solution that relies only on LLM tuning without requiring additional graph-specific encoders. However, text-only prompting struggles on heterogeneous fraud-detection graphs: multi-hop relations expand exponentially with each additional hop, leading to rapidly growing neighborhoods associated with dense textual information. These neighborhoods may overwhelm the model with long, irrelevant content in the prompt and suppress key signals from the target node, thereby degrading performance. To address this challenge, we propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node while summarizing neighbor information into coarse-grained text prompts. DGP introduces tailored summarization strategies for different data modalities—bi-level semantic abstraction for textual fields and statistical aggregation for numerical features—enabling effective compression of verbose neighbor content into concise, informative prompts. Experiments across public and industry datasets demonstrate that DGP operates within a manageable token budget while improving fraud detection performance by up to 6.8% (AUPRC) over state-of-the-art methods, showing the potential of Graph-Enhanced LLMs for fraud detection.

AAAI Conference 2026 Conference Paper

Echoless Label-Based Pre-computation for Memory-Efficient Heterogeneous Graph Learning

  • Jun Hu
  • Shangheng Chen
  • Yufei He
  • Yuan Li
  • Bryan Hooi
  • Bingsheng He

Heterogeneous Graph Neural Networks (HGNNs) are widely used for deep learning on heterogeneous graphs. Typical end-to-end HGNNs require repetitive message passing during training, limiting efficiency for large-scale real-world graphs. Pre-computation-based HGNNs address this by performing message passing only once during preprocessing, collecting neighbor information into regular-shaped tensors, which enables efficient mini-batch training. Label-based pre-computation methods collect neighbors' label information but suffer from training label leakage, where a node's own label information propagates back to itself during multi-hop message passing—the echo effect. Existing mitigation strategies are memory-inefficient on large graphs or suffer from compatibility issues with advanced message passing methods. We propose Echoless Label-based Pre-computation (Echoless-LP), which eliminates training label leakage with Partition-Focused Echoless Propagation (PFEP). PFEP partitions target nodes and performs echoless propagation, where nodes in each partition collect label information only from neighbors in other partitions, avoiding echo while remaining memory-efficient and compatible with any message passing method. We also introduce an Asymmetric Partitioning Scheme (APS) and a PostAdjust mechanism to address information loss from partitioning and distributional shifts across partitions. Experiments on public datasets demonstrate that Echoless-LP achieves superior performance and maintains memory efficiency compared to baselines.

AAAI Conference 2026 Conference Paper

NTSFormer: A Self-Teaching Graph Transformer for Multimodal Isolated Cold-Start Node Classification

  • Jun Hu
  • Yufei He
  • Yuan Li
  • Bryan Hooi
  • Bingsheng He

Isolated cold-start node classification on multimodal graphs is challenging because such nodes have no edges and often have missing modalities (e.g., absent text or image features). Existing methods address structural isolation by degrading graph learning models to multilayer perceptrons (MLPs) for isolated cold-start inference, using a teacher model (with graph access) to guide the MLP. However, this results in limited model capacity in the student, which is further challenged when modalities are missing. In this paper, we propose Neighbor-to-Self Graph Transformer (NTSFormer), a unified Graph Transformer framework that jointly tackles the isolation and missing-modality issues via a self-teaching paradigm. Specifically, NTSFormer uses a cold-start attention mask to simultaneously make two predictions for each node: a "student" prediction based only on self information (i.e., the node's own features), and a "teacher" prediction incorporating both self and neighbor information. This enables the model to supervise itself without degrading to an MLP, thereby fully leveraging the Transformer's capacity to handle missing modalities. To handle diverse graph information and missing modalities, NTSFormer performs a one-time multimodal graph pre-computation that converts structural and feature data into token sequences, which are then processed by Mixture-of-Experts (MoE) Input Projection and Transformer layers for effective fusion. Experiments on public datasets show that NTSFormer achieves superior performance for multimodal isolated cold-start node classification.

ICML Conference 2025 Conference Paper

Adapting Precomputed Features for Efficient Graph Condensation

  • Yuan Li 0032
  • Jun Hu 0016
  • Zemin Liu
  • Bryan Hooi
  • Jia Chen 0011
  • Bingsheng He

Graph Neural Networks (GNNs) face significant computational challenges when handling large-scale graphs. To address this, Graph Condensation (GC) methods aim to compress large graphs into smaller, synthetic ones that are more manageable for GNN training. Recently, trajectory matching methods have shown state-of-the-art (SOTA) performance for GC, aligning the model’s training behavior on a condensed graph with that on the original graph by guiding the trajectory of model parameters. However, these approaches require repetitive GNN retraining during condensation, making them computationally expensive. To address the efficiency issue, we completely bypass trajectory matching and propose a novel two-stage framework. The first stage, a precomputation stage, performs one-time message passing to extract structural and semantic information from the original graph. The second stage, a diversity-aware adaptation stage, performs class-wise alignment while maximizing the diversity of synthetic features. Remarkably, even with just the precomputation stage, which takes only seconds, our method either matches or surpasses 5 out of 9 baseline results. Extensive experiments show that our approach achieves comparable or better performance while being 96$\times$ to 2, 455$\times$ faster than SOTA methods, making it more practical for large-scale GNN applications. Our code and data are available at https: //github. com/Xtra-Computing/GCPA.

ICLR Conference 2025 Conference Paper

CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs

  • Jinlan Fu
  • Shenzhen Huangfu
  • Hao Fei 0001
  • Xiaoyu Shen 0001
  • Bryan Hooi
  • Xipeng Qiu
  • See-Kiong Ng

Multimodal Large Language Models (MLLMs) still struggle with hallucinations despite their impressive capabilities. Recent studies have attempted to mitigate this by applying Direct Preference Optimization (DPO) to multimodal scenarios using preference pairs from text-based responses. However, our analysis of representation distributions reveals that multimodal DPO struggles to align image and text representations and to distinguish between hallucinated and non-hallucinated descriptions. To address these challenges, In this work, we propose a Cross-modal Hierarchical Direct Preference Optimization (CHiP) to address these limitations. We introduce a visual preference optimization module within the DPO framework, enabling MLLMs to learn from both textual and visual preferences simultaneously. Furthermore, we propose a hierarchical textual preference optimization module that allows the model to capture preferences at multiple granular levels, including response, segment, and token levels. We evaluate CHiP through both quantitative and qualitative analyses, with results across multiple benchmarks demonstrating its effectiveness in reducing hallucinations. On the Object HalBench dataset, CHiP outperforms DPO in hallucination reduction, achieving improvements of 52.7% and 55.5% relative points based on the base model Muffin and LLaVA models, respectively. We make all our datasets and code publicly available.

NeurIPS Conference 2025 Conference Paper

ConfTuner: Training Large Language Models to Express Their Confidence Verbally

  • Yibo Li
  • Miao Xiong
  • Jiaying Wu
  • Bryan Hooi

Large Language Models (LLMs) are increasingly deployed in high-stakes domains such as science, law, and healthcare, where accurate expressions of uncertainty are essential for reliability and trust. However, current LLMs are often observed to generate incorrect answers with high confidence—a phenomenon known as "overconfidence". Recent efforts have focused on calibrating LLMs' verbalized confidence: i. e. , their expressions of confidence in text form, such as "I am 80% confident that. .. ". Existing approaches either rely on prompt engineering or fine-tuning with heuristically generated uncertainty estimates, both of which have limited effectiveness and generalizability. Motivated by the notion of proper scoring rules for calibration in classical machine learning models, we introduce ConfTuner, a simple and efficient fine-tuning method that introduces minimal overhead and does not require ground-truth confidence scores or proxy confidence estimates. ConfTuner relies on a new loss function, tokenized Brier score, which we theoretically prove to be a proper scoring rule, intuitively meaning that it "correctly incentivizes the model to report its true probability of being correct". ConfTuner improves calibration across diverse reasoning tasks and generalizes to black-box models such as GPT-4o. Our results further show that better-calibrated confidence enables downstream gains in self-correction and model cascade, advancing the development of trustworthy LLM systems. The code is available at https: //github. com/liushiliushi/ConfTuner.

ICML Conference 2025 Conference Paper

FlipAttack: Jailbreak LLMs via Flipping

  • Yue Liu 0008
  • Xiaoxin He
  • Miao Xiong
  • Jinlan Fu
  • Shumin Deng
  • Yingwei Ma
  • Jiaheng Zhang
  • Bryan Hooi

This paper proposes a simple yet effective jailbreak attack named FlipAttack against black-box LLMs. First, from the autoregressive nature, we reveal that LLMs tend to understand the text from left to right and find that they struggle to comprehend the text when the perturbation is added to the left side. Motivated by these insights, we propose to disguise the harmful prompt by constructing a left-side perturbation merely based on the prompt itself, then generalize this idea to 4 flipping modes. Second, we verify the strong ability of LLMs to perform the text-flipping task and then develop 4 variants to guide LLMs to understand and execute harmful behaviors accurately. These designs keep FlipAttack universal, stealthy, and simple, allowing it to jailbreak black-box LLMs within only 1 query. Experiments on 8 LLMs demonstrate the superiority of FlipAttack. Remarkably, it achieves $\sim$78. 97% attack success rate across 8 LLMs on average and $\sim$98% bypass rate against 5 guard models on average.

NeurIPS Conference 2025 Conference Paper

GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning

  • Haonan Yuan
  • Qingyun Sun
  • Junhua Shi
  • Xingcheng Fu
  • Bryan Hooi
  • Jianxin Li
  • Philip S Yu

Inspired by the remarkable success of foundation models in language and vision, Graph Foundation Models (GFMs) hold significant promise for broad applicability across diverse graph tasks and domains. However, existing GFMs struggle with unstable few-shot fine-tuning, where both performance and adaptation efficiency exhibit significant fluctuations caused by the randomness in the support sample selection and structural discrepancies between the pre-trained and target graphs. How to fine-tune GFMs robustly and efficiently to enable trustworthy knowledge transfer across domains and tasks is the major challenge. In this paper, we propose GRAVER, a novel G enerative g RA ph V ocabulari E s for R obust GFM fine-tuning framework that tackles the aforementioned instability via generative augmentations. Specifically, to identify transferable units, we analyze and extract key class-specific subgraph patterns by ego-graph disentanglement and validate their transferability both theoretically and empirically. To enable effective pre-training across diverse domains, we leverage a universal task template based on ego-graph similarity and construct graph vocabularies via graphon-based generative experts. To facilitate robust and efficient prompt fine-tuning, we grave the support samples with in-context vocabularies, where the lightweight MoE-CoE network attentively routes knowledge from source domains. Extensive experiments demonstrate the superiority of GRAVER over effectiveness, robustness, and efficiency on downstream few-shot node and graph classification tasks compared with 15 state-of-the-art baselines.

NeurIPS Conference 2025 Conference Paper

GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning

  • Yue Liu
  • Shengfang Zhai
  • Mingzhe Du
  • Yulin Chen
  • Tri Cao
  • Hongcheng Gao
  • Cheng Wang
  • Xinfeng Li

To enhance the safety of VLMs, this paper introduces a novel reasoning-based VLM guard model dubbed GuardReasoner-VL. The core idea is to incentivize the guard model to deliberatively reason before making moderation decisions via online RL. First, we construct GuardReasoner-VLTrain, a reasoning corpus with 123K samples and 631K reasoning steps, spanning text, image, and text-image inputs. Then, based on it, we cold-start our model's reasoning ability via SFT. In addition, we further enhance reasoning regarding moderation through online RL. Concretely, to enhance diversity and difficulty of samples, we conduct rejection sampling followed by data augmentation via the proposed safety-aware data concatenation. Besides, we use a dynamic clipping parameter to encourage exploration in early stages and exploitation in later stages. To balance performance and token efficiency, we design a length-aware safety reward that integrates accuracy, format, and token cost. Extensive experiments demonstrate the superiority of our model. Remarkably, it surpasses the runner-up by 19. 27% F1 score on average, as shown in Figure 1. We release data, code, and models (3B/7B) of GuardReasoner-VL: https: //github. com/yueliu1999/GuardReasoner-VL.

ICML Conference 2025 Conference Paper

How Much Can Transfer? BRIDGE: Bounded Multi-Domain Graph Foundation Model with Generalization Guarantees

  • Haonan Yuan
  • Qingyun Sun
  • Junhua Shi
  • Xingcheng Fu
  • Bryan Hooi
  • Jianxin Li 0002
  • Philip S. Yu

Graph Foundation Models hold significant potential for advancing multi-domain graph learning, yet their full capabilities remain largely untapped. Existing works show promising task performance with the “pretrain-then-prompt” paradigm, which lacks theoretical foundations to understand why it works and how much knowledge can be transferred from source domains to the target. In this paper, we introduce BRIDGE, a bounded graph foundation model pre-trained on multi-domains with Generalization guarantees. To learn discriminative source knowledge, we align multi-domain graph features with domain-invariant aligners during pre-training. Then, a lightweight Mixture of Experts (MoE) network is proposed to facilitate downstream prompting through self-supervised selective knowledge assembly and transfer. Further, to determine the maximum amount of transferable knowledge, we derive an optimizable generalization error upper bound from a graph spectral perspective given the Lipschitz continuity. Extensive experiments demonstrate the superiority of BRIDGE on both node and graph classification compared with 15 state-of-the-art baselines.

NeurIPS Conference 2025 Conference Paper

MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research

  • Hui Chen
  • Miao Xiong
  • Yujie Lu
  • Wei Han
  • Ailin Deng
  • Yufei He
  • Jiaying Wu
  • Yibo Li

Recent advancements in AI agents have demonstrated their growing potential to drive and support scientific discovery. In this work, we introduce MLR-Bench, a comprehensive benchmark for evaluating AI agents on open-ended machine learning research. MLR-Bench includes three key components: (1) 201 research tasks sourced from NeurIPS, ICLR, and ICML workshops covering diverse ML topics; (2) MLR-Judge, an automated evaluation framework combining LLM-based reviewers with carefully designed review rubrics to assess research quality; and (3) MLR-Agent, a modular agent scaffold capable of completing research tasks through four stages: idea generation, proposal formulation, experimentation, and paper writing. Our framework supports both stepwise assessment across these distinct research stages, and end-to-end evaluation of the final research paper. We then use MLR-Bench to evaluate six frontier LLMs and an advanced coding agent, finding that while LLMs are effective at generating coherent ideas and well-structured papers, current coding agents frequently (e. g. , in 80\% of the cases) produce fabricated or invalidated experimental results—posing a major barrier to scientific reliability. We validate MLR-Judge through human evaluation, showing high agreement with expert reviewers, supporting its potential as a scalable tool for research evaluation. We open-source MLR-Bench to help the community benchmark, diagnose, and improve AI research agents toward trustworthy and transparent scientific discovery.

AAAI Conference 2025 Conference Paper

Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation

  • Jun Hu
  • Bryan Hooi
  • Bingsheng He
  • Yinwei Wei

Multimodal recommendation systems can learn users' preferences from existing user-item interactions as well as the semantics of multimodal data associated with items. Many existing methods model this through a multimodal user-item graph, approaching multimodal recommendation as a graph learning task. Graph Neural Networks (GNNs) have shown promising performance in this domain. Prior research has capitalized on GNNs' capability to capture neighborhood information within certain receptive fields (typically denoted by the number of hops, K) to enrich user and item semantics. We observe that the optimal receptive fields for GNNs can vary across different modalities. In this paper, we propose GNNs with Modality-Independent Receptive Fields, which employ separate GNNs with independent receptive fields for different modalities to enhance performance. Our results indicate that the optimal K for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information. To address this, we introduce a Sampling-based Global Transformer, which utilizes uniform global sampling to effectively integrate global information for GNNs. We conduct comprehensive experiments that demonstrate the superiority of our approach over existing methods.

ICML Conference 2025 Conference Paper

Monte Carlo Tree Search for Comprehensive Exploration in LLM-Based Automatic Heuristic Design

  • Zhi Zheng 0009
  • Zhuoliang Xie
  • Zhenkun Wang 0001
  • Bryan Hooi

Handcrafting heuristics for solving complex optimization tasks (e. g. , route planning and task allocation) is a common practice but requires extensive domain knowledge. Recently, Large Language Model (LLM)-based automatic heuristic design (AHD) methods have shown promise in generating high-quality heuristics without manual interventions. Existing LLM-based AHD methods employ a population to maintain a fixed number of top-performing LLM-generated heuristics and introduce evolutionary computation (EC) to iteratively enhance the population. However, these population-based procedures cannot fully develop the potential of each heuristic and are prone to converge into local optima. To more comprehensively explore the space of heuristics, this paper proposes to use Monte Carlo Tree Search (MCTS) for LLM-based heuristic evolution. The proposed MCTS-AHD method organizes all LLM-generated heuristics in a tree structure and can better develop the potential of temporarily underperforming heuristics. In experiments, MCTS-AHD delivers significantly higher-quality heuristics on various complex tasks. Our code is available.

ICLR Conference 2025 Conference Paper

Multi-Label Node Classification with Label Influence Propagation

  • Yifei Sun 0002
  • Zemin Liu
  • Bryan Hooi
  • Yang Yang 0009
  • Rizal Fathony
  • Jia Chen 0011
  • Bingsheng He

Graphs are a complex and versatile data structure used across various domains, with possibly multi-label nodes playing a particularly crucial role. Examples include proteins in PPI networks with multiple functions and users in social or e-commerce networks exhibiting diverse interests. Tackling multi-label node classification (MLNC) on graphs has led to the development of various approaches. Some methods leverage graph neural networks (GNNs) to exploit label co-occurrence correlations, while others incorporate label embeddings to capture label proximity. However, these approaches fail to account for the intricate influences between labels in non-Euclidean graph data. To address this issue, we decompose the message passing process in GNNs into two operations: propagation and transformation. We then conduct a comprehensive analysis and quantification of the influence correlations between labels in each operation. Building on these insights, we propose a novel model, Label Influence Propagation (LIP). Specifically, we construct a label influence graph based on the integrated label correlations. Then, we propagate high-order influences through this graph, dynamically adjusting the learning process by amplifying labels with positive contributions and mitigating those with negative influence. Finally, our framework is evaluated on comprehensive benchmark datasets, consistently outperforming SOTA methods across various settings, demonstrating its effectiveness on MLNC tasks.

AAAI Conference 2025 Conference Paper

PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection

  • Tri Cao
  • Chengyu Huang
  • Yuexin Li
  • Wang Huilin
  • Amy He
  • Nay Oo
  • Bryan Hooi

Phishing attacks are a major threat to online security, exploiting user vulnerabilities to steal sensitive information. Various methods have been developed to counteract phishing, each with varying levels of accuracy, but they also face notable limitations. In this study, we introduce PhishAgent, a multimodal agent that combines a wide range of tools, integrating both online and offline knowledge bases with Multimodal Large Language Models (MLLMs). This combination leads to broader brand coverage, which enhances brand recognition and recall. Furthermore, we propose a multimodal information retrieval framework designed to extract the relevant top k items from offline knowledge bases, using available information from a webpage, including logos and HTML. Our empirical results, based on three real-world datasets, demonstrate that the proposed framework significantly enhances detection accuracy and reduces both false positives and false negatives, while maintaining model efficiency. Additionally, PhishAgent shows strong resilience against various types of adversarial attacks.

TMLR Journal 2025 Journal Article

RNA-FrameFlow: Flow Matching for de novo 3D RNA Backbone Design

  • Rishabh Anand
  • Chaitanya K. Joshi
  • Alex Morehead
  • Arian Rokkum Jamasb
  • Charles Harris
  • Simon V Mathis
  • Kieran Didi
  • Rex Ying

We introduce RNA-FrameFlow, the first generative model for 3D RNA backbone design. We build upon SE(3) flow matching for protein backbone generation and establish protocols for data preparation and evaluation to address unique challenges posed by RNA modeling. We formulate RNA structures as a set of rigid-body frames and associated loss functions which account for larger, more conformationally flexible RNA backbones (13 atoms per nucleotide) vs. proteins (4 atoms per residue). Toward tackling the lack of diversity in 3D RNA datasets, we explore training with structural clustering and cropping augmentations. Additionally, we define a suite of evaluation metrics to measure whether the generated RNA structures are globally self-consistent (via inverse folding followed by forward folding) and locally recover RNA-specific structural descriptors. The most performant version of RNA-FrameFlow generates locally realistic RNA backbones of 40-150 nucleotides, over 40% of which pass our validity criteria as measured by a self consistency TM-score ≥ 0.45, at which two RNAs have the same global fold. Open-source code: https://github.com/rish-16/rna-backbone-design

TMLR Journal 2025 Journal Article

UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting

  • Juncheng Liu
  • Chenghao Liu
  • Gerald Woo
  • Yiwei Wang
  • Bryan Hooi
  • Caiming Xiong
  • Doyen Sahoo

Transformer-based models have emerged as powerful tools for multivariate time series forecasting (MTSF). However, existing Transformer models often fall short of capturing both intricate dependencies across variate and temporal dimensions in MTS data. Some recent models are proposed to separately capture variate and temporal dependencies through either two sequential or parallel attention mechanisms. However, these methods cannot directly and explicitly learn the intricate inter-series and intra-series dependencies. In this work, we first demonstrate that these dependencies are very important as they usually exist in real-world data. To directly model these dependencies, we propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Additionally, we add a dispatcher module which reduces the complexity and makes the model feasible for a potentially large number of variates. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our extensive experiments on several datasets for time series forecasting.

NeurIPS Conference 2024 Conference Paper

$\text{ID}^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition

  • Jianqing Xu
  • Shen Li
  • Jiaying Wu
  • Miao Xiong
  • Ailin Deng
  • Jiazhen Ji
  • Yuge Huang
  • Guodong Mu

Synthetic face recognition (SFR) aims to generate synthetic face datasets that mimic the distribution of real face data, which allows for training face recognition models in a privacy-preserving manner. Despite the remarkable potential of diffusion models in image generation, current diffusion-based SFR models struggle with generalization to real-world faces. To address this limitation, we outline three key objectives for SFR: (1) promoting diversity across identities (inter-class diversity), (2) ensuring diversity within each identity by injecting various facial attributes (intra-class diversity), and (3) maintaining identity consistency within each identity group (intra-class identity preservation). Inspired by these goals, we introduce a diffusion-fueled SFR model termed $\text{ID}^3$. $\text{ID}^3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances. Theoretically, we show that minimizing this loss is equivalent to maximizing the lower bound of an adjusted conditional log-likelihood over ID-preserving data. This equivalence motivates an ID-preserving sampling algorithm, which operates over an adjusted gradient vector field, enabling the generation of fake face recognition datasets that approximate the distribution of real-world faces. Extensive experiments across five challenging benchmarks validate the advantages of $\text{ID}^3$.

ICLR Conference 2024 Conference Paper

Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs

  • Miao Xiong
  • Zhiyuan Hu
  • Xinyang Lu
  • Yifei Li
  • Jie Fu 0001
  • Junxian He
  • Bryan Hooi

Empowering large language models (LLMs) to accurately express confidence in their answers is essential for reliable and trustworthy decision-making. Previous confidence elicitation methods, which primarily rely on *white-box access* to internal model information or model fine-tuning, have become less suitable for LLMs, especially closed-source commercial APIs. This leads to a growing need to explore the untapped area of *black-box* approaches for LLM uncertainty estimation. To better break down the problem, we define a systematic framework with three components: *prompting* strategies for eliciting verbalized confidence, *sampling* methods for generating multiple responses, and *aggregation* techniques for computing consistency. We then benchmark these methods on two key tasks—confidence calibration and failure prediction—across five types of datasets (e.g., commonsense and arithmetic reasoning) and five widely-used LLMs including GPT-4 and LLaMA 2 Chat. Our analysis uncovers several key insights: 1) LLMs, when verbalizing their confidence, tend to be *overconfident*, potentially imitating human patterns of expressing confidence. 2) As model capability scales up, both calibration and failure prediction performance improve, yet still far from ideal performance. 3) Employing our proposed strategies, such as human-inspired prompts, consistency among multiple responses, and better aggregation strategies can help mitigate this overconfidence from various perspectives. 4) Comparisons with white-box methods indicate that while white-box methods perform better, the gap is narrow, e.g., 0.522 to 0.605 in AUROC. Despite these advancements, none of these techniques consistently outperform others, and all investigated methods struggle in challenging tasks, such as those requiring professional knowledge, indicating significant scope for improvement. We believe this study can serve as a strong baseline and provide insights for eliciting confidence in black-box LLMs. The code is publicly available at https://github.com/MiaoXiong2320/llm-uncertainty.

ICLR Conference 2024 Conference Paper

Consistency Training with Learnable Data Augmentation for Graph Anomaly Detection with Limited Supervision

  • Nan Chen
  • Zemin Liu
  • Bryan Hooi
  • Bingsheng He
  • Rizal Fathony
  • Jun Hu 0016
  • Jia Chen 0011

Graph Anomaly Detection (GAD) has surfaced as a significant field of research, predominantly due to its substantial influence in production environments. Although existing approaches for node anomaly detection have shown effectiveness, they have yet to fully address two major challenges: operating in settings with limited supervision and managing class imbalance effectively. In response to these challenges, we propose a novel model, ConsisGAD, which is tailored for GAD in scenarios characterized by limited supervision and is anchored in the principles of consistency training. Under limited supervision, ConsisGAD effectively leverages the abundance of unlabeled data for consistency training by incorporating a novel learnable data augmentation mechanism, thereby introducing controlled noise into the dataset. Moreover, ConsisGAD takes advantage of the variance in homophily distribution between normal and anomalous nodes to craft a simplified GNN backbone, enhancing its capability to distinguish effectively between these two classes. Comprehensive experiments on several benchmark datasets validate the superior performance of ConsisGAD in comparison to state-of-the-art baselines. Our code is available at https://github.com/Xtra-Computing/ConsisGAD.

NeurIPS Conference 2024 Conference Paper

G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering

  • Xiaoxin He
  • Yijun Tian
  • Yifei Sun
  • Nitesh V. Chawla
  • Thomas Laurent
  • Yann LeCun
  • Xavier Bresson
  • Bryan Hooi

Given a graph with textual attributes, we enable users to `chat with their graph': that is, to ask questions about the graph using a conversational interface. In response to a user's questions, our method provides textual replies and highlights the relevant parts of the graph. While existing works integrate large language models (LLMs) and graph neural networks (GNNs) in various ways, they mostly focus on either conventional graph tasks (such as node, edge, and graph classification), or on answering simple graph queries on small or synthetic graphs. In contrast, we develop a flexible question-answering framework targeting real-world textual graphs, applicable to multiple applications including scene graph understanding, common sense reasoning, and knowledge graph reasoning. Toward this goal, we first develop a Graph Question Answering (GraphQA) benchmark with data collected from different tasks. Then, we propose our \textit{G-Retriever} method, introducing the first retrieval-augmented generation (RAG) approach for general textual graphs, which can be fine-tuned to enhance graph understanding via soft prompting. To resist hallucination and to allow for textual graphs that greatly exceed the LLM's context window size, \textit{G-Retriever} performs RAG over a graph by formulating this task as a Prize-Collecting Steiner Tree optimization problem. Empirical evaluations show that our method outperforms baselines on textual graph tasks from multiple domains, scales well with larger graph sizes, and mitigates hallucination. ~\footnote{Our codes and datasets are available at: \url{https: //github. com/XiaoxinHe/G-Retriever}}

ICLR Conference 2024 Conference Paper

Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

  • Xiaoxin He
  • Xavier Bresson
  • Thomas Laurent 0001
  • Adam Perold
  • Yann LeCun
  • Bryan Hooi

Representation learning on text-attributed graphs (TAGs) has become a critical research problem in recent years. A typical example of a TAG is a paper citation graph, where the text of each paper serves as node attributes. Initial graph neural network (GNN) pipelines handled these text attributes by transforming them into shallow or hand-crafted features, such as skip-gram or bag-of-words features. Recent efforts have focused on enhancing these pipelines with language models (LMs), which typically demand intricate designs and substantial computational resources. With the advent of powerful large language models (LLMs) such as GPT or Llama2, which demonstrate an ability to reason and to utilize general knowledge, there is a growing need for techniques which combine the textual modelling abilities of LLMs with the structural learning capabilities of GNNs. Hence, in this work, we focus on leveraging LLMs to capture textual information as features, which can be used to boost GNN performance on downstream tasks. A key innovation is our use of \emph{explanations as features}: we prompt an LLM to perform zero-shot classification, request textual explanations for its decision-making process, and design an \emph{LLM-to-LM interpreter} to translate these explanations into informative features for downstream GNNs. Our experiments demonstrate that our method achieves state-of-the-art results on well-established TAG datasets, including \texttt{Cora}, \texttt{PubMed}, \texttt{ogbn-arxiv}, as well as our newly introduced dataset, \texttt{tape-arxiv23}. Furthermore, our method significantly speeds up training, achieving a 2.88 times improvement over the closest baseline on \texttt{ogbn-arxiv}. Lastly, we believe the versatility of the proposed method extends beyond TAGs and holds the potential to enhance other tasks involving graph-text data~\footnote{Our codes and datasets are available at: \url{https://github.com/XiaoxinHe/TAPE}}.

ICLR Conference 2024 Conference Paper

Partitioning Message Passing for Graph Fraud Detection

  • Wei Zhuo 0006
  • Zemin Liu
  • Bryan Hooi
  • Bingsheng He
  • Guang Tan
  • Rizal Fathony
  • Jia Chen 0011

Label imbalance and homophily-heterophily mixture are the fundamental problems encountered when applying Graph Neural Networks (GNNs) to Graph Fraud Detection (GFD) tasks. Existing GNN-based GFD models are designed to augment graph structure to accommodate the inductive bias of GNNs towards homophily, by excluding heterophilic neighbors during message passing. In our work, we argue that the key to applying GNNs for GFD is not to exclude but to {\em distinguish} neighbors with different labels. Grounded in this perspective, we introduce Partitioning Message Passing (PMP), an intuitive yet effective message passing paradigm expressly crafted for GFD. Specifically, in the neighbor aggregation stage of PMP, neighbors with different classes are aggregated with distinct node-specific aggregation functions. By this means, the center node can adaptively adjust the information aggregated from its heterophilic and homophilic neighbors, thus avoiding the model gradient being dominated by benign nodes which occupy the majority of the population. We theoretically establish a connection between the spatial formulation of PMP and spectral analysis to characterize that PMP operates an adaptive node-specific spectral graph filter, which demonstrates the capability of PMP to handle heterophily-homophily mixed graphs. Extensive experimental results show that PMP can significantly boost the performance on GFD tasks.

AAAI Conference 2024 Conference Paper

PoetryDiffusion: Towards Joint Semantic and Metrical Manipulation in Poetry Generation

  • Zhiyuan Hu
  • Chumin Liu
  • Yue Feng
  • Anh Tuan Luu
  • Bryan Hooi

Controllable text generation is a challenging and meaningful field in natural language generation (NLG). Especially, poetry generation is a typical one with well-defined and strict conditions for text generation which is an ideal playground for the assessment of current methodologies. While prior works succeeded in controlling either semantic or metrical aspects of poetry generation, simultaneously addressing both remains a challenge. In this paper, we pioneer the use of the Diffusion model for generating sonnets and Chinese SongCi poetry to tackle such challenges. In terms of semantics, our PoetryDiffusion model, built upon the Diffusion model, generates entire sentences or poetry by comprehensively considering the entirety of sentence information. This approach enhances semantic expression, distinguishing it from autoregressive and large language models (LLMs). For metrical control, its constraint control module which can be trained individually enables us to flexibly incorporate a novel metrical controller to manipulate and evaluate metrics (format and rhythm). The denoising process in PoetryDiffusion allows for the gradual enhancement of semantics and flexible integration of the metrical controller which can calculate and impose penalties on states that stray significantly from the target control distribution. Experimental results on two datasets demonstrate that our model outperforms existing models in terms of automatic evaluation of semantic, metrical, and overall performance as well as human evaluation. Codes are released to https://github.com/ChorlingLau/PoetryDiffusion.

ICLR Conference 2024 Conference Paper

Scalable and Effective Implicit Graph Neural Networks on Large Graphs

  • Juncheng Liu
  • Bryan Hooi
  • Kenji Kawaguchi
  • Yiwei Wang 0001
  • Chaosheng Dong
  • Xiaokui Xiao

Graph Neural Networks (GNNs) have become the de facto standard for modeling graph-structured data in various applications. Among them, implicit GNNs have shown a superior ability to effectively capture long-range dependencies in underlying graphs. However, implicit GNNs tend to be computationally expensive and have high memory usage, due to 1) their use of full-batch training; and 2) they require a large number of iterations to solve a fixed-point equation. These compromise the scalability and efficiency of implicit GNNs especially on large graphs. In this paper, we aim to answer the question: how can we efficiently train implicit GNNs to provide effective predictions on large graphs? We propose a new scalable and effective implicit GNN (SEIGNN) with a mini-batch training method and a stochastic solver, which can be trained efficiently on large graphs. Specifically, SEIGNN can more effectively incorporate global and long-range information by introducing coarse-level nodes in the mini-batch training method. It also achieves reduced training time by obtaining unbiased approximate solutions with fewer iterations in the proposed solver. Comprehensive experiments on various large graphs demonstrate that SEIGNN outperforms baselines and achieves higher accuracy with less training time compared with existing implicit GNNs.

IJCAI Conference 2024 Conference Paper

Truth Table Net: Scalable, Compact & Verifiable Neural Networks with a Dual Convolutional Small Boolean Circuit Networks Form

  • Adrien Benamira
  • Thomas Peyrin
  • Trevor Yap
  • Tristan Guérand
  • Bryan Hooi

We introduce "Truth Table net"' (TTnet), a novel Deep Neural Network (DNN) architecture designed to provide excellent scalability/compactness trade-offs among DNNs, allowing in turn to tackle the DNN challenge of fast formal verification. TTnet is constructed using Learning Truth Table (LTT) filters, analogous to how a Deep Convolutional Neural Network (DCNN) is built upon convolutional filters. The differentiable LTT filters are unique by their dual form: they are both a neural network-based function and a small-sized truth table that can be computed within a practical time frame. This characteristic guarantees, by design and independently of the overall architecture, the ability to practically extract an efficient (in terms of the number of logical gates) and functionally equivalent Conjunctive Normal Form (CNF) Boolean logic gate implementation. This CNF circuit is even optimal when the LTT truth table's input bit size n < 12. In particular, TTnet architecture is the first differentiable DNN with as dual form a compact logic gate representation that can scale to datasets larger than CIFAR-10: we achieve an accuracy of 41% on the ImageNet dataset while ensuring that each LTT filter truth table is fully computable within 2^{16} operations. We further compare the compactness and scalability performances of TTnet Boolean logic circuit representation to state-of-the-art differentiable logic DNNs across tabular, MNIST, and CIFAR-10 datasets. We emphasize that TTnet is the first solution to the open problem of designing differentiable convolutional neural networks with an exact dual logic gate circuit representation, bridging the gap between symbolic AI and trainable DCNNs. Finally, as improving DNNs compactness in Boolean logic circuit form reduces the complexity of their formal verification, we demonstrate TTnet effectiveness in exact sound and complete formal verification. Notably, our model achieves robustness verification in 10ms vs 100s for traditional state-of-the-art DNNs solvers.

NeurIPS Conference 2024 Conference Paper

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in LLMs

  • Zhiyuan Hu
  • Chumin Liu
  • Xidong Feng
  • Yilun Zhao
  • See-Kiong Ng
  • Anh T. Luu
  • Junxian He
  • Pang W. Koh

In the face of uncertainty, the ability to seek information is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given, and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines: 1. An uncertainty-aware simulation approach which enables the model to simulate possible future scenarios and how likely they are to occur, 2. Uncertainty-based rewards motivated by information gain which incentivizes the model to seek information, and3. A reward propagation scheme to select the optimal question to ask in a way that maximizes the expected reward. In experiments on medical diagnosis, troubleshooting and the `20 Questions' game, UoT achieves an average performance improvement of 38. 1% in the rate of successful task completion across multiple LLMs compared with direct prompting, and also improves efficiency (i. e. , the number of questions needed to complete the task).

ICML Conference 2023 Conference Paper

A Generalization of ViT/MLP-Mixer to Graphs

  • Xiaoxin He
  • Bryan Hooi
  • Thomas Laurent 0001
  • Adam Perold
  • Yann LeCun
  • Xavier Bresson

Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph ViT/MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on Long Range Graph Benchmark and TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them. The source code is available for reproducibility at: https: //github. com/XiaoxinHe/Graph-ViT-MLPMixer.

NeurIPS Conference 2023 Conference Paper

Expanding Small-Scale Datasets with Guided Imagination

  • Yifan Zhang
  • Daquan Zhou
  • Bryan Hooi
  • Kai Wang
  • Jiashi Feng

The power of DNNs relies heavily on the quantity and quality of training data. However, collecting and annotating data on a large scale is often expensive and time-consuming. To address this issue, we explore a new task, termed dataset expansion, aimed at expanding a ready-to-use small dataset by automatically creating new labeled samples. To this end, we present a Guided Imagination Framework (GIF) that leverages cutting-edge generative models like DALL-E2 and Stable Diffusion (SD) to "imagine" and create informative new data from the input seed data. Specifically, GIF conducts data imagination by optimizing the latent features of the seed data in the semantically meaningful space of the prior model, resulting in the creation of photo-realistic images with new content. To guide the imagination towards creating informative samples for model training, we introduce two key criteria, i. e. , class-maintained information boosting and sample diversity promotion. These criteria are verified to be essential for effective dataset expansion: GIF-SD obtains 13. 5% higher model accuracy on natural image datasets than unguided expansion with SD. With these essential criteria, GIF successfully expands small datasets in various scenarios, boosting model accuracy by 36. 9% on average over six natural image datasets and by 13. 5% on average over three medical datasets. The source code is available at https: //github. com/Vanint/DatasetExpansion.

ICML Conference 2023 Conference Paper

GraphCleaner: Detecting Mislabelled Samples in Popular Graph Learning Benchmarks

  • Yu-Wen Li
  • Miao Xiong
  • Bryan Hooi

Label errors have been found to be prevalent in popular text, vision, and audio datasets, which heavily influence the safe development and evaluation of machine learning algorithms. Despite increasing efforts towards improving the quality of generic data types, such as images and texts, the problem of mislabel detection in graph data remains underexplored. To bridge the gap, we explore mislabelling issues in popular real-world graph datasets and propose GraphCleaner, a post-hoc method to detect and correct these mislabelled nodes in graph datasets. GraphCleaner combines the novel ideas of 1) Synthetic Mislabel Dataset Generation, which seeks to generate realistic mislabels; and 2) Neighborhood-Aware Mislabel Detection, where neighborhood dependency is exploited in both labels and base classifier predictions. Empirical evaluations on 6 datasets and 6 experimental settings demonstrate that GraphCleaner outperforms the closest baseline, with an average improvement of $0. 14$ in F1 score, and $0. 16$ in MCC. On real-data case studies, GraphCleaner detects real and previously unknown mislabels in popular graph benchmarks: PubMed, Cora, CiteSeer and OGB-arxiv; we find that at least 6. 91% of PubMed data is mislabelled or ambiguous, and simply removing these mislabelled data can boost evaluation performance from 86. 71% to 89. 11%.

ICML Conference 2023 Conference Paper

Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement

  • Ailin Deng
  • Miao Xiong
  • Bryan Hooi

Reliable application of machine learning is of primary importance to the practical deployment of deep learning methods. A fundamental challenge is that models are often unreliable due to overconfidence. In this paper, we estimate a model’s reliability by measuring the agreement between its latent space, and the latent space of a foundation model. However, it is challenging to measure the agreement between two different latent spaces due to their incoherence, e. g. , arbitrary rotations and different dimensionality. To overcome this incoherence issue, we design a neighborhood agreement measure between latent spaces and find that this agreement is surprisingly well-correlated with the reliability of a model’s predictions. Further, we show that fusing neighborhood agreement into a model’s predictive confidence in a post-hoc way significantly improves its reliability. Theoretical analysis and extensive experiments on failure detection across various datasets verify the effectiveness of our method on both in-distribution and out-of-distribution settings.

NeurIPS Conference 2023 Conference Paper

LargeST: A Benchmark Dataset for Large-Scale Traffic Forecasting

  • Xu Liu
  • Yutong Xia
  • Yuxuan Liang
  • Junfeng Hu
  • Yiwei Wang
  • Lei Bai
  • Chao Huang
  • Zhenguang Liu

Road traffic forecasting plays a critical role in smart city initiatives and has experienced significant advancements thanks to the power of deep learning in capturing non-linear patterns of traffic data. However, the promising results achieved on current public datasets may not be applicable to practical scenarios due to limitations within these datasets. First, the limited sizes of them may not reflect the real-world scale of traffic networks. Second, the temporal coverage of these datasets is typically short, posing hurdles in studying long-term patterns and acquiring sufficient samples for training deep models. Third, these datasets often lack adequate metadata for sensors, which compromises the reliability and interpretability of the data. To mitigate these limitations, we introduce the LargeST benchmark dataset. It encompasses a total number of 8, 600 sensors in California with a 5-year time coverage and includes comprehensive metadata. Using LargeST, we perform in-depth data analysis to extract data insights, benchmark well-known baselines in terms of their performance and efficiency, and identify challenges as well as opportunities for future research. We release the datasets and baseline implementations at: https: //github. com/liuxu77/LargeST.

NeurIPS Conference 2023 Conference Paper

Proximity-Informed Calibration for Deep Neural Networks

  • Miao Xiong
  • Ailin Deng
  • Pang Wei W. Koh
  • Jiaying Wu
  • Shen Li
  • Jianqing Xu
  • Bryan Hooi

Confidence calibration is central to providing accurate and interpretable uncertainty estimates, especially under safety-critical scenarios. However, we find that existing calibration algorithms often overlook the issue of proximity bias, a phenomenon where models tend to be more overconfident in low proximity data (i. e. , data lying in the sparse region of the data distribution) compared to high proximity samples, and thus suffer from inconsistent miscalibration across different proximity samples. We examine the problem over $504$ pretrained ImageNet models and observe that: 1) Proximity bias exists across a wide variety of model architectures and sizes; 2) Transformer-based models are relatively more susceptible to proximity bias than CNN-based models; 3) Proximity bias persists even after performing popular calibration algorithms like temperature scaling; 4) Models tend to overfit more heavily on low proximity samples than on high proximity samples. Motivated by the empirical findings, we propose ProCal, a plug-and-play algorithm with a theoretical guarantee to adjust sample confidence based on proximity. To further quantify the effectiveness of calibration algorithms in mitigating proximity bias, we introduce proximity-informed expected calibration error (PIECE) with theoretical analysis. We show that ProCal is effective in addressing proximity bias and improving calibration on balanced, long-tail, and distribution-shift settings under four metrics over various model architectures. We believe our findings on proximity bias will guide the development of fairer and better-calibrated} models, contributing to the broader pursuit of trustworthy AI.

ICML Conference 2023 Conference Paper

Reachability-Aware Laplacian Representation in Reinforcement Learning

  • Kaixin Wang
  • Kuangqi Zhou
  • Jiashi Feng
  • Bryan Hooi
  • Xinchao Wang

In Reinforcement Learning (RL), Laplacian Representation (LapRep) is a task-agnostic state representation that encodes the geometry of the environment. A desirable property of LapRep stated in prior works is that the Euclidean distance in the LapRep space roughly reflects the reachability between states, which motivates the usage of this distance for reward shaping. However, we find that LapRep does not necessarily have this property in general: two states having a small distance under LapRep can actually be far away in the environment. Such a mismatch would impede the learning process in reward shaping. To fix this issue, we introduce a Reachability-Aware Laplacian Representation (RA-LapRep), by properly scaling each dimension of LapRep. Despite the simplicity, we demonstrate that RA-LapRep can better capture the inter-state reachability as compared to LapRep, through both theoretical explanations and experimental results. Additionally, we show that this improvement yields a significant boost in reward shaping performance and benefits bottleneck state discovery.

ICML Conference 2023 Conference Paper

Towards Better Graph Representation Learning with Parameterized Decomposition & Filtering

  • Mingqi Yang
  • Wenjie Feng 0001
  • Yanming Shen
  • Bryan Hooi

Proposing an effective and flexible matrix to represent a graph is a fundamental challenge that has been explored from multiple perspectives, e. g. , filtering in Graph Fourier Transforms. In this work, we develop a novel and general framework which unifies many existing GNN models from the view of parameterized decomposition and filtering, and show how it helps to enhance the flexibility of GNNs while alleviating the smoothness and amplification issues of existing models. Essentially, we show that the extensively studied spectral graph convolutions with learnable polynomial filters are constrained variants of this formulation, and releasing these constraints enables our model to express the desired decomposition and filtering simultaneously. Based on this generalized framework, we develop models that are simple in implementation but achieve significant improvements and computational efficiency on a variety of graph learning tasks. Code is available at https: //github. com/qslim/PDF.

TMLR Journal 2022 Journal Article

Birds of a Feather Trust Together: Knowing When to Trust a Classifier via Adaptive Neighborhood Aggregation

  • Miao Xiong
  • Shen Li
  • Wenjie Feng
  • Ailin Deng
  • Jihai Zhang
  • Bryan Hooi

How do we know when the predictions made by a classifier can be trusted? This is a fundamental problem that also has immense practical applicability, especially in safety-critical areas such as medicine and autonomous driving. The de facto approach of using the classifier's softmax outputs as a proxy for trustworthiness suffers from the over-confidence issue; while the most recent works incur problems such as additional retraining cost and accuracy versus trustworthiness trade-off. In this work, we argue that the trustworthiness of a classifier's prediction for a sample is highly associated with two factors: the sample's neighborhood information and the classifier's output. To combine the best of both worlds, we design a model-agnostic post-hoc approach NeighborAGG to leverage the two essential information via an adaptive neighborhood aggregation. Theoretically, we show that NeighborAGG is a generalized version of a one-hop graph convolutional network, inheriting the powerful modeling ability to capture the varying similarity between samples within each class. We also extend our approach to the closely related task of mislabel detection and provide a theoretical coverage guarantee to bound the false negative. Empirically, extensive experiments on image and tabular benchmarks verify our theory and suggest that NeighborAGG outperforms other methods, achieving state-of-the-art trustworthiness performance.

IJCAI Conference 2022 Conference Paper

CADET: Calibrated Anomaly Detection for Mitigating Hardness Bias

  • Ailin Deng
  • Adam Goodge
  • Lang Yi Ang
  • Bryan Hooi

The detection of anomalous samples in large, high-dimensional datasets is a challenging task with numerous practical applications. Recently, state-of-the-art performance is achieved with deep learning methods: for example, using the reconstruction error from an autoencoder as anomaly scores. However, the scores are uncalibrated: that is, they follow an unknown distribution and lack a clear interpretation. Furthermore, the reconstruction error is highly influenced by the `hardness' of a given sample, which leads to false negative and false positive errors. In this paper, we empirically show the significance of this hardness bias present in a range of recent deep anomaly detection methods. To mitigate this, we propose an efficient and plug-and-play error calibration method which mitigates this hardness bias in the anomaly scoring without the need to retrain the model. We verify the effectiveness of our method on a range of image, time-series, and tabular datasets and against several baseline methods.

AAAI Conference 2022 Conference Paper

LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks

  • Adam Goodge
  • Bryan Hooi
  • See-Kiong Ng
  • Wee Siong Ng

Many well-established anomaly detection methods use the distance of a sample to those in its local neighbourhood: socalled ‘local outlier methods’, such as LOF and DBSCAN. They are popular for their simple principles and strong performance on unstructured, feature-based data that is commonplace in many practical applications. However, they cannot learn to adapt for a particular set of data due to their lack of trainable parameters. In this paper, we begin by unifying local outlier methods by showing that they are particular cases of the more general message passing framework used in graph neural networks. This allows us to introduce learnability into local outlier methods, in the form of a neural network, for greater flexibility and expressivity: specifically, we propose LUNAR, a novel, graph neural network-based anomaly detection method. LUNAR learns to use information from the nearest neighbours of each node in a trainable way to find anomalies. We show that our method performs significantly better than existing local outlier methods, as well as state-ofthe-art deep baselines. We also show that the performance of our method is much more robust to different settings of the local neighbourhood size.

NeurIPS Conference 2022 Conference Paper

MGNNI: Multiscale Graph Neural Networks with Implicit Layers

  • Juncheng Liu
  • Bryan Hooi
  • Kenji Kawaguchi
  • Xiaokui Xiao

Recently, implicit graph neural networks (GNNs) have been proposed to capture long-range dependencies in underlying graphs. In this paper, we introduce and justify two weaknesses of implicit GNNs: the constrained expressiveness due to their limited effective range for capturing long-range dependencies, and their lack of ability to capture multiscale information on graphs at multiple resolutions. To show the limited effective range of previous implicit GNNs, we first provide a theoretical analysis and point out the intrinsic relationship between the effective range and the convergence of iterative equations used in these models. To mitigate the mentioned weaknesses, we propose a multiscale graph neural network with implicit layers (MGNNI) which is able to model multiscale structures on graphs and has an expanded effective range for capturing long-range dependencies. We conduct comprehensive experiments for both node classification and graph classification to show that MGNNI outperforms representative baselines and has a better ability for multiscale modeling and capturing of long-range dependencies.

IJCAI Conference 2022 Conference Paper

Neural PCA for Flow-Based Representation Learning

  • Shen Li
  • Bryan Hooi

Of particular interest is to discover useful representations solely from observations in an unsupervised generative manner. However, the question of whether existing normalizing flows provide effective representations for downstream tasks remains mostly unanswered despite their strong ability for sample generation and density estimation. This paper investigates this problem for such a family of generative models that admits exact invertibility. We propose Neural Principal Component Analysis (Neural-PCA) that operates in full dimensionality while capturing principal components in descending order. Without exploiting any label information, the principal components recovered store the most informative elements in their leading dimensions and leave the negligible in the trailing ones, allowing for clear performance improvements of 5%-10% in downstream tasks. Such improvements are empirically found consistent irrespective of the number of latent trailing dimensions dropped. Our work suggests that necessary inductive bias be introduced into generative modeling when representation quality is of interest.

NeurIPS Conference 2022 Conference Paper

Self-Supervised Aggregation of Diverse Experts for Test-Agnostic Long-Tailed Recognition

  • Yifan Zhang
  • Bryan Hooi
  • Lanqing Hong
  • Jiashi Feng

Existing long-tailed recognition methods, aiming to train class-balanced models from long-tailed data, generally assume the models would be evaluated on the uniform test class distribution. However, practical test class distributions often violate this assumption (e. g. , being either long-tailed or even inversely long-tailed), which may lead existing methods to fail in real applications. In this paper, we study a more practical yet challenging task, called test-agnostic long-tailed recognition, where the training class distribution is long-tailed while the test class distribution is agnostic and not necessarily uniform. In addition to the issue of class imbalance, this task poses another challenge: the class distribution shift between the training and test data is unknown. To tackle this task, we propose a novel approach, called Self-supervised Aggregation of Diverse Experts, which consists of two strategies: (i) a new skill-diverse expert learning strategy that trains multiple experts from a single and stationary long-tailed dataset to separately handle different class distributions; (ii) a novel test-time expert aggregation strategy that leverages self-supervision to aggregate the learned multiple experts for handling unknown test class distributions. We theoretically show that our self-supervised strategy has a provable ability to simulate test-agnostic class distributions. Promising empirical results demonstrate the effectiveness of our method on both vanilla and test-agnostic long-tailed recognition. The source code is available at https: //github. com/Vanint/SADE-AgnosticLT.

ICML Conference 2022 Conference Paper

The Geometry of Robust Value Functions

  • Kaixin Wang
  • Navdeep Kumar
  • Kuangqi Zhou
  • Bryan Hooi
  • Jiashi Feng
  • Shie Mannor

The space of value functions is a fundamental concept in reinforcement learning. Characterizing its geometric properties may provide insights for optimization and representation. Existing works mainly focus on the value space for Markov Decision Processes (MDPs). In this paper, we study the geometry of the robust value space for the more general Robust MDPs (RMDPs) setting, where transition uncertainties are considered. Specifically, since we find it hard to directly adapt prior approaches to RMDPs, we start with revisiting the non-robust case, and introduce a new perspective that enables us to characterize both the non-robust and robust value space in a similar fashion. The key of this perspective is to decompose the value space, in a state-wise manner, into unions of hypersurfaces. Through our analysis, we show that the robust value space is determined by a set of conic hypersurfaces, each of which contains the robust values of all policies that agree on one state. Furthermore, we find that taking only extreme points in the uncertainty set is sufficient to determine the robust value space. Finally, we discuss some other aspects about the robust value space, including its non-convexity and policy agreement on multiple states.

NeurIPS Conference 2021 Conference Paper

Adaptive Data Augmentation on Temporal Graphs

  • Yiwei Wang
  • Yujun Cai
  • Yuxuan Liang
  • Henghui Ding
  • Changhu Wang
  • Siddharth Bhatia
  • Bryan Hooi

Temporal Graph Networks (TGNs) are powerful on modeling temporal graph data based on their increased complexity. Higher complexity carries with it a higher risk of overfitting, which makes TGNs capture random noise instead of essential semantic information. To address this issue, our idea is to transform the temporal graphs using data augmentation (DA) with adaptive magnitudes, so as to effectively augment the input features and preserve the essential semantic information. Based on this idea, we present the MeTA (Memory Tower Augmentation) module: a multi-level module that processes the augmented graphs of different magnitudes on separate levels, and performs message passing across levels to provide adaptively augmented inputs for every prediction. MeTA can be flexibly applied to the training of popular TGNs to improve their effectiveness without increasing their time complexity. To complement MeTA, we propose three DA strategies to realistically model noise by modifying both the temporal and topological features. Empirical results on standard datasets show that MeTA yields significant gains for the popular TGN models on edge prediction and node classification in an efficient manner.

NeurIPS Conference 2021 Conference Paper

EIGNN: Efficient Infinite-Depth Graph Neural Networks

  • Juncheng Liu
  • Kenji Kawaguchi
  • Bryan Hooi
  • Yiwei Wang
  • Xiaokui Xiao

Graph neural networks (GNNs) are widely used for modelling graph-structured data in numerous applications. However, with their inherently finite aggregation layers, existing GNN models may not be able to effectively capture long-range dependencies in the underlying graphs. Motivated by this limitation, we propose a GNN model with infinite depth, which we call Efficient Infinite-Depth Graph Neural Networks (EIGNN), to efficiently capture very long-range dependencies. We theoretically derive a closed-form solution of EIGNN which makes training an infinite-depth GNN model tractable. We then further show that we can achieve more efficient computation for training EIGNN by using eigendecomposition. The empirical results of comprehensive experiments on synthetic and real-world datasets show that EIGNN has a better ability to capture long-range dependencies than recent baselines, and consistently achieves state-of-the-art performance. Furthermore, we show that our model is also more robust against both noise and adversarial perturbations on node features.

AAAI Conference 2021 Conference Paper

ExGAN: Adversarial Generation of Extreme Samples

  • Siddharth Bhatia
  • Arjit Jain
  • Bryan Hooi

Mitigating the risk arising from extreme events is a fundamental goal with many applications, such as the modelling of natural disasters, financial crashes, epidemics, and many others. To manage this risk, a vital step is to be able to understand or generate a wide range of extreme scenarios. Existing approaches based on Generative Adversarial Networks (GANs) excel at generating realistic samples, but seek to generate typical samples, rather than extreme samples. Hence, in this work, we propose ExGAN, a GAN-based approach to generate realistic and extreme samples. To model the extremes of the training distribution in a principled way, our work draws from Extreme Value Theory (EVT), a probabilistic approach for modelling the extreme tails of distributions. For practical utility, our framework allows the user to specify both the desired extremeness measure, as well as the desired extremeness probability they wish to sample at. Experiments on real US precipitation data show that our method generates realistic samples, based on visual inspection and quantitative measures, in an efficient manner. Moreover, generating increasingly extreme examples using ExGAN can be done in constant time (with respect to the extremeness probability τ), as opposed to the O( 1 τ ) time required by the baseline approach.

AAAI Conference 2021 Conference Paper

Graph Neural Network-Based Anomaly Detection in Multivariate Time Series

  • Ailin Deng
  • Bryan Hooi

Given high-dimensional time series data (e. g. , sensor data), how can we detect anomalous events, such as system faults and attacks? More challengingly, how can we do this in a way that captures complex inter-sensor relationships, and detects and explains anomalies which deviate from these relationships? Recently, deep learning approaches have enabled improvements in anomaly detection in high-dimensional datasets; however, existing methods do not explicitly learn the structure of existing relationships between variables, or use them to predict the expected behavior of time series. Our approach combines a structure learning approach with graph neural networks, additionally using attention weights to provide explainability for the detected anomalies. Experiments on two real-world sensor datasets with ground truth anomalies show that our method detects anomalies more accurately than baseline approaches, accurately captures correlations between sensors, and allows users to deduce the root cause of a detected anomaly.

NeurIPS Conference 2021 Conference Paper

SSMF: Shifting Seasonal Matrix Factorization

  • Koki Kawabata
  • Siddharth Bhatia
  • Rui Liu
  • Mohit Wadhwa
  • Bryan Hooi

Given taxi-ride counts information between departure and destination locations, how can we forecast their future demands? In general, given a data stream of events with seasonal patterns that innovate over time, how can we effectively and efficiently forecast future events? In this paper, we propose Shifting Seasonal Matrix Factorization approach, namely SSMF, that can adaptively learn multiple seasonal patterns (called regimes), as well as switching between them. Our proposed method has the following properties: (a) it accurately forecasts future events by detecting regime shifts in seasonal patterns as the data stream evolves; (b) it works in an online setting, i. e. , processes each observation in constant time and memory; (c) it effectively realizes regime shifts without human intervention by using a lossless data compression scheme. We demonstrate that our algorithm outperforms state-of-the-art baseline methods by accurately forecasting upcoming events on three real-world data streams.

ICML Conference 2021 Conference Paper

Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing

  • Kaixin Wang
  • Kuangqi Zhou
  • Qixin Zhang 0001
  • Jie Shao 0006
  • Bryan Hooi
  • Jiashi Feng

The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward shaping. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward shaping.

NeurIPS Conference 2021 Conference Paper

Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning

  • Yifan Zhang
  • Bryan Hooi
  • Dapeng Hu
  • Jian Liang
  • Jiashi Feng

Contrastive self-supervised learning (CSL) has attracted increasing attention for model pre-training via unlabeled data. The resulted CSL models provide instance-discriminative visual features that are uniformly scattered in the feature space. During deployment, the common practice is to directly fine-tune CSL models with cross-entropy, which however may not be the best strategy in practice. Although cross-entropy tends to separate inter-class features, the resulting models still have limited capability for reducing intra-class feature scattering that exists in CSL models. In this paper, we investigate whether applying contrastive learning to fine-tuning would bring further benefits, and analytically find that optimizing the contrastive loss benefits both discriminative representation learning and model optimization during fine-tuning. Inspired by these findings, we propose Contrast-regularized tuning (Core-tuning), a new approach for fine-tuning CSL models. Instead of simply adding the contrastive loss to the objective of fine-tuning, Core-tuning further applies a novel hard pair mining strategy for more effective contrastive fine-tuning, as well as smoothing the decision boundary to better exploit the learned discriminative feature space. Extensive experiments on image classification and semantic segmentation verify the effectiveness of Core-tuning.

AAAI Conference 2020 Conference Paper

FlowScope: Spotting Money Laundering Based on Graphs

  • Xiangfeng Li
  • Shenghua Liu
  • Zifeng Li
  • Xiaotian Han
  • Chuan Shi
  • Bryan Hooi
  • He Huang
  • Xueqi Cheng

Given a graph of the money transfers between accounts of a bank, how can we detect money laundering? Money laundering refers to criminals using the bank’s services to move massive amounts of illegal money to untraceable destination accounts, in order to inject their illegal money into the legitimate financial system. Existing graph fraud detection approaches focus on dense subgraph detection, without considering the fact that money laundering involves high-volume flows of funds through chains of bank accounts, thereby decreasing their detection accuracy. Instead, we propose to model the transactions using a multipartite graph, and detect the complete flow of money from source to destination using a scalable algorithm, FlowScope. Theoretical analysis shows that FlowScope provides guarantees in terms of the amount of money that fraudsters can transfer without being detected. FlowScope outperforms state-of-the-art baselines in accurately detecting the accounts involved in money laundering, in both injected and real-world data settings.

ICLR Conference 2020 Conference Paper

Identifying through Flows for Recovering Latent Representations

  • Shen Li 0004
  • Bryan Hooi
  • Gim Hee Lee

Identifiability, or recovery of the true latent representations from which the observed data originates, is de facto a fundamental goal of representation learning. Yet, most deep generative models do not address the question of identifiability, and thus fail to deliver on the promise of the recovery of the true latent sources that generate the observations. Recent work proposed identifiable generative modelling using variational autoencoders (iVAE) with a theory of identifiability. Due to the intractablity of KL divergence between variational approximate posterior and the true posterior, however, iVAE has to maximize the evidence lower bound (ELBO) of the marginal likelihood, leading to suboptimal solutions in both theory and practice. In contrast, we propose an identifiable framework for estimating latent representations using a flow-based model (iFlow). Our approach directly maximizes the marginal likelihood, allowing for theoretical guarantees on identifiability, thereby dispensing with variational approximations. We derive its optimization objective in analytical form, making it possible to train iFlow in an end-to-end manner. Simulations on synthetic data validate the correctness and effectiveness of our proposed method and demonstrate its practical advantages over other existing methods.

AAAI Conference 2020 Conference Paper

Midas: Microcluster-Based Detector of Anomalies in Edge Streams

  • Siddharth Bhatia
  • Bryan Hooi
  • Minji Yoon
  • Kijung Shin
  • Christos Faloutsos

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. MIDAS has the following properties: (a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability; (b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 108 − 505 times faster than state-of-the-art approaches; (c) it provides 46%-52% higher accuracy (in terms of AUC) than state-of-the-art approaches.

IJCAI Conference 2020 Conference Paper

Robustness of Autoencoders for Anomaly Detection Under Adversarial Impact

  • Adam Goodge
  • Bryan Hooi
  • See Kiong Ng
  • Wee Siong Ng

Detecting anomalies is an important task in a wide variety of applications and domains. Deep learning methods have achieved state-of-the-art performance in anomaly detection in recent years; unsupervised methods being particularly popular. However, deep learning methods can be fragile to small perturbations in the input data. This can be exploited by an adversary to deliberately hinder model performance; an adversarial attack. This phenomena has been widely studied in the context of supervised image classification since its discovery, however such studies for an anomaly detection setting are sorely lacking. Moreover, the plethora of defense mechanisms that have been proposed are often not applicable to unsupervised anomaly detection models. In this work, we study the effect of adversarial attacks on the performance of anomaly-detecting autoencoders using real data from a Cyber physical system (CPS) testbed with intervals of controlled, physical attacks as anomalies. An adversary would attempt to disguise these points as normal through adversarial perturbations. To combat this, we propose the Approximate Projection Autoencoder (APAE), which incorporates two defenses against such attacks into a general autoencoder. One of these involves a novel technique to improve robustness under adversarial impact by optimising latent representations for better reconstruction outputs.

AAAI Conference 2020 Conference Paper

TellTail: Fast Scoring and Detection of Dense Subgraphs

  • Bryan Hooi
  • Kijung Shin
  • Hemank Lamba
  • Christos Faloutsos

Suppose you visit an e-commerce site, and see that 50 users each reviewed almost all of the same 500 products several times each: would you get suspicious? Similarly, given a Twitter follow graph, how can we design principled measures for identifying surprisingly dense subgraphs? Dense subgraphs often indicate interesting structure, such as network attacks in network traffic graphs. However, most existing dense subgraph measures either do not model normal variation, or model it using an Erdős-Renyi assumption - but this assumption has been discredited decades ago. What is the right assumption then? We propose a novel application of extreme value theory to the dense subgraph problem, which allows us to propose measures and algorithms which evaluate the surprisingness of a subgraph probabilistically, without requiring restrictive assumptions (e. g. Erdős-Renyi). We then improve the practicality of our approach by incorporating empirical observations about dense subgraph patterns in real graphs, and by proposing a fast pruning-based search algorithm. Our approach (a) provides theoretical guarantees of consistency, (b) scales quasi-linearly, and (c) outperforms baselines in synthetic and ground truth settings.

IJCAI Conference 2019 Conference Paper

BeatGAN: Anomalous Rhythm Detection using Adversarially Generated Time Series

  • Bin Zhou
  • Shenghua Liu
  • Bryan Hooi
  • Xueqi Cheng
  • Jing Ye

Given a large-scale rhythmic time series containing mostly normal data segments (or `beats'), can we learn how to detect anomalous beats in an effective yet efficient way? For example, how can we detect anomalous beats from electrocardiogram (ECG) readings? Existing approaches either require excessively high amounts of labeled and balanced data for classification, or rely on less regularized reconstructions, resulting in lower accuracy in anomaly detection. Therefore, we propose BeatGAN, an unsupervised anomaly detection algorithm for time series data. BeatGAN outputs explainable results to pinpoint the anomalous time ticks of an input beat, by comparing them to adversarially generated beats. Its robustness is guaranteed by its regularization of reconstruction error using an adversarial generation approach, as well as data augmentation using time series warping. Experiments show that BeatGAN accurately and efficiently detects anomalous beats in ECG time series, and routes doctors' attention to anomalous time ticks, achieving accuracy of nearly 0. 95 AUC, and very fast inference (2. 6 ms per beat). In addition, we show that BeatGAN accurately detects unusual motions from multivariate motion-capture time series data, illustrating its generality.

IJCAI Conference 2018 Conference Paper

NeuCast: Seasonal Neural Forecast of Power Grid Time Series

  • Pudi Chen
  • Shenghua Liu
  • Chuan Shi
  • Bryan Hooi
  • Bai Wang
  • Xueqi Cheng

In the smart power grid, short-term load forecasting (STLF) is a crucial step in scheduling and planning for future load, so as to improve the reliability, cost, and emissions of the power grid. Different from traditional time series forecast, STLF is a more challenging task, because of the complex demand of active and reactive power from numerous categories of electrical loads and the effects of environment. Therefore, we propose NeuCast, a seasonal neural forecasting method, which dynamically models various loads as co-evolving time series in a hidden space, as well as extra weather conditions, in a neural network structure. NeuCast captures seasonality and patterns of the time series by integrating factor modeling and hidden state recognition. NeuCast can also detect anomalies and forecast under different temperature assumptions. Extensive experiments on 134 real-word datasets show the improvements of NeuCast over the stateof-the-art methods.