Arrow Research search

Author name cluster

Shirui Pan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

106 papers
2 author rows

Possible papers

106

AAAI Conference 2026 Conference Paper

Assemble Your Crew: Automatic Multi-agent Communication Topology Design via Autoregressive Graph Generation

  • Shiyuan Li
  • Yixin Liu
  • Qingsong Wen
  • Chengqi Zhang
  • Shirui Pan

Multi-agent systems (MAS) based on large language models (LLMs) have emerged as a powerful solution for dealing with complex problems across diverse domains. The effectiveness of MAS is critically dependent on its collaboration topology, which has become a focal point for automated design research. However, existing approaches are fundamentally constrained by their reliance on a template graph modification paradigm with a predefined set of agents and hard-coded interaction structures, significantly limiting their adaptability to task-specific requirements. To address these limitations, we reframe MAS design as a conditional autoregressive graph generation task, where both the system composition and structure are designed jointly. We propose ARG-Designer, a novel autoregressive model that operationalizes this paradigm by constructing the collaboration graph from scratch. Conditioned on a natural language task query, ARG-Designer sequentially and dynamically determines the required number of agents, selects their appropriate roles from an extensible pool, and establishes the optimal communication links between them. This generative approach creates a customized topology in a flexible and extensible manner, precisely tailored to the unique demands of different tasks. Extensive experiments across six diverse benchmarks demonstrate that ARG-Designer not only achieves state-of-the-art performance but also enjoys significantly greater token efficiency and enhanced extensibility.

AAAI Conference 2026 Conference Paper

Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time

  • Junjun Pan
  • Yixin Liu
  • Chuan Zhou
  • Fei Xiong
  • Alan Wee-Chung Liew
  • Shirui Pan

Graph anomaly detection (GAD), which aims to detect outliers in graph-structured data, has received increasing research attention recently. However, existing GAD methods assume identical training and testing distributions, which is rarely valid in practice. In real-world scenarios, unseen but normal samples may emerge during deployment, leading to a normality shift that degrades the performance of GAD models trained on the original data. Through empirical analysis, we reveal that the degradation arises from (1) semantic confusion, where unseen normal samples are misinterpreted as anomalies due to their novel patterns, and (2) aggregation contamination, where the representations of seen normal nodes are distorted by unseen normals through message aggregation. While retraining or fine-tuning GAD models could be a potential solution to the above challenges, the high cost of model retraining and the difficulty of obtaining labeled data often render this approach impractical in real-world applications. To bridge the gap, we proposed a lightweight and plug-and-play Test-time adaptation framework for correcting Unseen Normal pattErns (TUNE) in GAD. To address semantic confusion, a graph aligner is employed to align the shifted data to the original one at the graph attribute level. Moreover, we utilize the minimization of representation-level shift as a supervision signal to train the aligner, which leverages the estimated aggregation contamination as a key indicator of normality shift. Extensive experiments on 10 real-world datasets demonstrate that TUNE significantly enhances the generalizability of pre-trained GAD models to both synthetic and real unseen normal patterns.

IS Journal 2026 Journal Article

Graph-Augmented Large Language Model Agents: Current Progress and Future Prospects

  • Yixin Liu
  • Guibin Zhang
  • Kun Wang
  • Shiyuan Li
  • Shirui Pan
  • Bo An

Autonomous agents based on large language models (LLMs) have demonstrated impressive capabilities in numerous real-world applications. While most LLMs are limited in several key agentic procedures, graphs can serve as a powerful auxiliary structure to enhance structure, continuity, and coordination in complex agent workflows. Given the rapid growth and fragmentation of research on Graph-augmented LLM Agents (GLA), this article offers a timely and comprehensive overview of recent advances and highlights key directions for future work. Specifically, we categorize existing GLA methods by their primary functions in LLM agent systems, including planning, memory, and tool usage, and then analyze how graphs and graph learning algorithms contribute to each. For multiagent systems, we further discuss how GLA solutions facilitate the orchestration, efficiency optimization, and trustworthiness of MAS. Finally, we highlight key future directions to advance this field, from improving structural adaptability to enabling unified, scalable, and multimodal GLA systems.

AAAI Conference 2026 Conference Paper

GraphRAG-Induced Dual Knowledge Structure Graphs for Personalized Learning Path Recommendation

  • Xinghe Cheng
  • Zihan Zhang
  • Jiapu Wang
  • Liangda Fang
  • Chaobo He
  • Quanlong Guan
  • Shirui Pan
  • Weiqi Luo

Learning path recommendation seeks to provide students with a structured sequence of learning items (e.g., knowledge concepts or exercises) to optimize their learning efficiency. Despite significant efforts in this area, most existing methods primarily rely on prerequisite relations, which present two major limitations: (1) Prerequisite relations between knowledge concepts are difficult to obtain due to the cost of expert annotation, hindering the application of current learning path recommendation methods. (2) Relying on a single sequentially dependent knowledge structure based on prerequisite relations implies that a confusing knowledge concept can disrupt subsequent learning processes, which is referred to as blocked learning. To address these two challenges, we propose a novel approach, GraphRAG-Induced Dual Knowledge Structure Graphs for Personalized Learning Path Recommendation (KnowLP), which enhances learning path recommendations by incorporating both prerequisite and similarity relations between knowledge concepts. Specifically, we introduce a knowledge structure graph generation module EDU-GraphRAG that constructs knowledge structure graphs for different educational datasets, significantly improving the applicability of learning path recommendation methods. We then propose a Discrimination Learning-driven Reinforcement Learning (DLRL) module that utilizes similarity relations as fallback relations when prerequisite relations become ineffective, thereby alleviating the blocked learning. Finally, we conduct extensive experiments on three benchmark datasets, demonstrating that our method not only achieves state-of-the-art performance but also generates more effective and longer learning paths.

AAAI Conference 2025 Conference Paper

A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection

  • Junjun Pan
  • Yixin Liu
  • Xin Zheng
  • Yizhen Zheng
  • Alan Wee-Chung Liew
  • Fuyi Li
  • Shirui Pan

Graph fraud detection (GFD) has rapidly advanced in protecting online services by identifying malicious fraudsters. Recent supervised GFD research highlights that heterophilic connections between fraudster and user greatly impacts detection performance, where the fraudsters tend to camouflage themselves by building more connections to benign users. Despite their promising performance, their label reliance limits its application in unsupervised scenarios; Additionally, accurately capturing complex and diverse heterophily patterns without labels poses a further challenge. Therefore, we propose a Heterophily-guided Unsupervised Graph fraud dEtection approach (HUGE) for unsupervised GFD, which contains two essential components: a heterophily estimation module and an alignment-based fraud detection module. In the heterophily estimation module, we design a novel unsupervised heterophily metric called HALO, which captures the critical graph properties for GFD, enabling its outstanding ability to estimate heterophily with attributes. In the alignment-based fraud detection module, we develop a joint MLP-GNN architecture with ranking loss and asymmetric alignment loss. The ranking loss aligns the predicted fraud score with the relative order of HALO, providing an extra robustness guarantee by comparing heterophily between non-adjacent nodes. Moreover, the asymmetric alignment loss effectively utilizes structural information to alleviate the feature-smooth effects. Extensive experiments on six datasets demonstrate that HUGE consistently outperforms competitors, showcasing its effectiveness and robustness.

AAAI Conference 2025 Conference Paper

Adversarial Contrastive Graph Masked AutoEncoder Against Graph Structure and Feature Dual Attacks

  • Weixuan Shen
  • Xiaobo Shen
  • Shirui Pan

Graph Neural Networks (GNNs) have been shown vulnerable to graph adversarial attacks. Current robust graph representation learning methods mainly defend against graph structure attack, and improves performance of GNNs. However node feature in graph can been easily attacked in reality. The joint defense on graph structure and feature dual attacks remains challenging yet less studied. To fulfill this gap, we propose Adversarial Contrastive Graph Masked AutoEncoder (ACGMAE) to defend against graph structure and feature dual attacks. ACGMAE employs adversarial feature masking for reconstructing node feature to mitigate the influence of feature attack. ACGMAE employs contrastive learning on kNN graph and attacked graph, considers neighbor nodes as positive samples, and further calculates their probabilities being true positive to mitigate the effect of adversarial edges. Extensive experiments on node classification and clustering demonstrate the effectiveness of the proposed ACGMAE especially under graph structure and feature dual attacks.

ICML Conference 2025 Conference Paper

BiMark: Unbiased Multilayer Watermarking for Large Language Models

  • Xiaoyan Feng
  • He Zhang 0012
  • Yanjun Zhang
  • Leo Yu Zhang
  • Shirui Pan

Recent advances in Large Language Models (LLMs) have raised urgent concerns about LLM-generated text authenticity, prompting regulatory demands for reliable identification mechanisms. Although watermarking offers a promising solution, existing approaches struggle to simultaneously achieve three critical requirements: text quality preservation, model-agnostic detection, and message embedding capacity, which are crucial for practical implementation. To achieve these goals, the key challenge lies in balancing the trade-off between text quality preservation and message embedding capacity. To address this challenge, we propose BiMark, a novel watermarking framework that achieves these requirements through three key innovations: (1) a bit-flip unbiased reweighting mechanism enabling model-agnostic detection, (2) a multilayer architecture enhancing detectability without compromising generation quality, and (3) an information encoding approach supporting multi-bit watermarking. Through theoretical analysis and extensive experiments, we validate that, compared to state-of-the-art multi-bit watermarking methods, BiMark achieves up to 30% higher extraction rates for short texts while maintaining text quality indicated by lower perplexity, and performs comparably to non-watermarked text on downstream tasks such as summarization and translation.

ICML Conference 2025 Conference Paper

Conformal Anomaly Detection in Event Sequences

  • Shuai Zhang 0007
  • Chuan Zhou 0001
  • Yang Liu 0320
  • Peng Zhang 0001
  • Xixun Lin
  • Shirui Pan

Anomaly detection in continuous-time event sequences is a crucial task in safety-critical applications. While existing methods primarily focus on developing a superior test statistic, they fail to provide guarantees regarding the false positive rate (FPR), which undermines their reliability in practical deployments. In this paper, we propose CADES (Conformal Anomaly Detection in Event Sequences), a novel test procedure based on conformal inference for the studied task with finite-sample FPR control. Specifically, by using the time-rescaling theorem, we design two powerful non-conformity scores tailored to event sequences, which exhibit complementary sensitivities to different abnormal patterns. CADES combines these scores with Bonferroni correction to leverage their respective strengths and addresses non-identifiability issues of existing methods. Theoretically, we prove the validity of CADES and further provide strong guarantees on calibration-conditional FPR control. Experimental results on synthetic and real-world datasets, covering various types of anomalies, demonstrate that CADES outperforms state-of-the-art methods while maintaining FPR control.

NeurIPS Conference 2025 Conference Paper

Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking

  • Liangliang Zhang
  • Zhuorui Jiang
  • Hongliang Chi
  • Haoyang Chen
  • Mohammed ElKoumy
  • Fali Wang
  • Qiong Wu
  • Zhengyi Zhou

Knowledge Graph Question Answering (KGQA) systems rely on high-quality benchmarks to evaluate complex multi-hop reasoning. However, despite their widespread use, popular datasets such as WebQSP and CWQ suffer from critical quality issues, including inaccurate or incomplete ground-truth annotations, poorly constructed questions that are ambiguous, trivial, or unanswerable, and outdated or inconsistent knowledge. Through a manual audit of 16 popular KGQA datasets—including WebQSP and CWQ—we find that the average factual correctness rate is only 57%. To address these issues, we introduce KGQAGen, an LLM-in-the-loop framework that systematically resolves these pitfalls. KGQAGen combines structured knowledge grounding, LLM-guided generation, and symbolic verification to produce challenging and verifiable QA instances. Using KGQAGen, we construct KGQAGen-10k, a 10K-scale benchmark grounded in Wikidata, and evaluate a diverse set of KG-RAG models. Experimental results demonstrate that even state-of-the-art systems struggle on this benchmark, highlighting its ability to expose limitations of existing models. Our findings advocate for more rigorous benchmark construction and position KGQAGen as a scalable framework for advancing KGQA evaluation.

AAAI Conference 2025 Conference Paper

Domain-Level Disentanglement Framework Based on Information Enhancement for Cross-Domain Cold-Start Recommendation

  • Nian Rong
  • Fei Xiong
  • Shirui Pan
  • Guixun Luo
  • Jia Wu
  • Liang Wang

Recommender systems in various applications often encounter the challenge of cold-start, which refers to how to provide recommendations for completely new users. Cross-domain recommendation offers a solution to address this cold-start issue by leveraging user interaction information from other domains and providing recommendations for users in the target domain. However, applying the classic two-tower model in cross-domain scenarios for pure cold-start users proves challenging, and most existing cross-domain cold-start recommendation models adopt an embedding-mapping framework that lacks end-to-end efficiency. The parallel training recommendation method lacks consideration of the domain-level intrinsic characteristics of cross-domain information. In this paper, we propose a generalized framework that Domain-level Disentanglement framework based on information enhancement for Cross-domain Cold-start Recommendation. On one hand, we achieve deep utilization of domain-level information through independent extraction of domain knowledge and fusion using heuristic strategies. On the other hand, our model is incorporated with an information enhancement network based on user attention and a user personalized adaptor. We introduce measures to assess user variability and immutability in cross-domain recommendation, aiming to eliminate inter-domain bias and highlight individual user preferences. Experimental results on widely used cross-domain recommendation datasets demonstrate that our proposed model outperforms state-of-the-art methods, validating its effectiveness.

NeurIPS Conference 2025 Conference Paper

DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs

  • Dongyuan Li
  • Shiyin Tan
  • Ying Zhang
  • Ming Jin
  • Shirui Pan
  • Manabu Okumura
  • Renhe Jiang

Dynamic graph modeling aims to uncover evolutionary patterns in real-world systems, enabling accurate social recommendation and early detection of cancer cells. Inspired by the success of recent state space models in efficiently capturing long-term dependencies, we propose DyG-Mamba by translating dynamic graph modeling into a long-term sequence modeling problem. Specifically, inspired by Ebbinghaus' forgetting curve, we treat the irregular timespans between events as control signals, allowing DyG-Mamba to dynamically adjust the forgetting of historical information. This mechanism ensures effective usage of irregular timespans, thereby improving both model effectiveness and inductive capability. In addition, inspired by Ebbinghaus' review cycle, we redefine core parameters to ensure that DyG-Mamba selectively reviews historical information and filters out noisy inputs, further enhancing the model’s robustness. Through exhaustive experiments on 12 datasets covering dynamic link prediction and node classification tasks, we show that DyG-Mamba achieves state-of-the-art performance on most datasets, while demonstrating significantly improved computational and memory efficiency. Our code is available at https: //github. com/Clearloveyuan/DyG-Mamba.

ICML Conference 2025 Conference Paper

Equivalence is All: A Unified View for Self-supervised Graph Learning

  • Yejiang Wang
  • Yuhai Zhao
  • Zhengkui Wang
  • Ling Li
  • Jiapu Wang
  • Fangting Li
  • Miaomiao Huang
  • Shirui Pan

Node equivalence is common in graphs, such as computing networks, encompassing automorphic equivalence (preserving adjacency under node permutations) and attribute equivalence (nodes with identical attributes). Despite their importance for learning node representations, these equivalences are largely ignored by existing graph models. To bridge this gap, we propose a GrAph self-supervised Learning framework with Equivalence (GALE) and analyze its connections to existing techniques. Specifically, we: 1) unify automorphic and attribute equivalence into a single equivalence class; 2) enforce the equivalence principle to make representations within the same class more similar while separating those across classes; 3) introduce approximate equivalence classes with linear time complexity to address the NP-hardness of exact automorphism detection and handle node-feature variation; 4) analyze existing graph encoders, noting limitations in message passing neural networks and graph transformers regarding equivalence constraints; 5) show that graph contrastive learning are a degenerate form of equivalence constraint; and 6) demonstrate that GALE achieves superior performance over baselines.

NeurIPS Conference 2025 Conference Paper

GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation

  • Linhao Luo
  • Zicheng Zhao
  • Reza Haffari
  • Dinh Phung
  • Chen Gong
  • Shirui Pan

Retrieval-augmented generation (RAG) has proven effective in integrating knowledge into large language models (LLMs). However, conventional RAGs struggle to capture complex relationships between pieces of knowledge, limiting their performance in intricate reasoning that requires integrating knowledge from multiple sources. Recently, graph-enhanced retrieval augmented generation (GraphRAG) builds a graph structure to explicitly model these relationships, enabling more effective and efficient retrievers. Nevertheless, its performance is still hindered by the noise and incompleteness within the graph structure. To address this, we introduce GFM-RAG, a novel graph foundation model (GFM) for retrieval augmented generation. GFM-RAG is powered by an innovative graph neural network that reasons over graph structure to capture complex query-knowledge relationships. The GFM with 8M parameters undergoes a two-stage training process on large-scale datasets, comprising 60 knowledge graphs with over 14M triples and 700k documents. This results in impressive performance and generalizability for GFM-RAG, making it the first graph foundation model applicable to unseen datasets for retrieval without any fine-tuning required. Extensive experiments on three multi-hop QA datasets and seven domain-specific RAG datasets demonstrate that GFM-RAG achieves state-of-the-art performance while maintaining efficiency and alignment with neural scaling laws, highlighting its potential for further improvement.

ICLR Conference 2025 Conference Paper

Graph Sparsification via Mixture of Graphs

  • Guibin Zhang
  • Xiangguo Sun
  • Yanwei Yue
  • Chonghe Jiang
  • Kun Wang 0056
  • Tianlong Chen 0001
  • Shirui Pan

Graph Neural Networks (GNNs) have demonstrated superior performance across various graph learning tasks but face significant computational challenges when applied to large-scale graphs. One effective approach to mitigate these challenges is graph sparsification, which involves removing non-essential edges to reduce computational overhead. However, previous graph sparsification methods often rely on a single global sparsity setting and uniform pruning criteria, failing to provide customized sparsification schemes for each node's complex local context. In this paper, we introduce Mixture-of-Graphs (MoG), leveraging the concept of Mixture-of-Experts (MoE), to dynamically select tailored pruning solutions for each node. Specifically, MoG incorporates multiple sparsifier experts, each characterized by unique sparsity levels and pruning criteria, and selects the appropriate experts for each node. Subsequently, MoG performs a mixture of the sparse graphs produced by different experts on the Grassmann manifold to derive an optimal sparse graph. One notable property of MoG is its entirely local nature, as it depends on the specific circumstances of each individual node. Extensive experiments on four large-scale OGB datasets and two superpixel datasets, equipped with five GNN backbones, demonstrate that MoG (I) identifies subgraphs at higher sparsity levels ($8.67\\%\sim 50.85\\%$), with performance equal to or better than the dense graph, (II) achieves $1.47-2.62\times$ speedup in GNN inference with negligible performance drop, and (III) boosts ``top-student'' GNN performance ($1.02\\%\uparrow$ on RevGNN+\textsc{ogbn-proteins} and $1.74\\%\\uparrow$ on DeeperGCN+\textsc{ogbg-ppa}). The source code is available at \url{https://github.com/yanweiyue/MoG}.

TIST Journal 2025 Journal Article

Graph Stochastic Neural Process for Inductive Few-shot Knowledge Graph Completion

  • Zicheng Zhao
  • Linhao Luo
  • Shirui Pan
  • Chengqi Zhang
  • Chen Gong

Knowledge graphs (KGs) store enormous facts as relationships between entities. Due to the long-tailed distribution of relations and the incompleteness of KGs, there is growing interest in few-shot knowledge graph completion (FKGC). Existing FKGC methods often assume the existence of all entities in KGs, which may not be practical since new relations and entities can emerge over time. Therefore, we focus on a more challenging task called inductive few-shot knowledge graph completion (I-FKGC), where both relations and entities during the test phase are unknown before. Inspired by the idea of inductive reasoning, we cast I-FKGC as an inductive reasoning problem. Specifically, we propose a novel Graph Stochastic Neural Process ( GS-NP ) approach, which consists of two major modules. In the first module, to obtain a generalized hypothesis (e.g., shared subgraph), we present a neural process-based hypothesis extractor that models the joint distribution of hypothesis, from which we can sample a hypothesis for predictions. In the second module, based on the hypothesis, we propose a graph stochastic attention-based predictor to test if the triple in the query set aligns with the extracted hypothesis. Meanwhile, the predictor can generate an explanatory subgraph identified by the hypothesis. Finally, the training of these two modules is seamlessly combined into a unified objective function, of which the effectiveness is verified by theoretical analyses as well as empirical studies. Extensive experiments on three public datasets demonstrate that our method outperforms existing methods and derives new state-of-the-art performance.

ICML Conference 2025 Conference Paper

Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models

  • Linhao Luo
  • Zicheng Zhao
  • Gholamreza Haffari
  • Yuan-Fang Li
  • Chen Gong 0002
  • Shirui Pan

Large language models (LLMs) have demonstrated impressive reasoning abilities, but they still struggle with faithful reasoning due to knowledge gaps and hallucinations. To address these issues, knowledge graphs (KGs) have been utilized to enhance LLM reasoning through their structured knowledge. However, existing KG-enhanced methods, either retrieval-based or agent-based, encounter difficulties in accurately retrieving knowledge and efficiently traversing KGs at scale. In this work, we introduce graph-constrained reasoning (GCR), a novel framework that bridges structured knowledge in KGs with unstructured reasoning in LLMs. To eliminate hallucinations, GCR ensures faithful KG-grounded reasoning by integrating KG structure into the LLM decoding process through KG-Trie, a trie-based index that encodes KG reasoning paths. KG-Trie constrains the decoding process, allowing LLMs to directly reason on graphs and generate faithful reasoning paths grounded in KGs. Additionally, GCR leverages a lightweight KG-specialized LLM for graph-constrained reasoning alongside a powerful general LLM for inductive reasoning over multiple reasoning paths, resulting in accurate reasoning with zero reasoning hallucination. Extensive experiments on several KGQA benchmarks demonstrate that GCR achieves state-of-the-art performance and exhibits strong zero-shot generalizability to unseen KGs without additional training.

ICML Conference 2025 Conference Paper

Less is More: Federated Graph Learning with Alleviating Topology Heterogeneity from A Causal Perspective

  • Lele Fu
  • Bowen Deng 0002
  • Sheng Huang
  • Tianchi Liao
  • Shirui Pan
  • Chuan Chen 0001

Federated graph learning (FGL) aims to collaboratively train a global graph neural network (GNN) on multiple private graphs with preserving the local data privacy. Besides the common cases of data heterogeneity in conventional federated learning, FGL faces the unique challenge of topology heterogeneity. Most of existing FGL methods alleviate the negative impact of heterogeneity by introducing global signals. However, the manners of creating increments might not be effective and significantly increase the computation amount. In light of this, we propose the FedATH, an FGL method with Alleviating Topology Heterogeneity from a causal perspective. Inspired by the causal theory, we argue that not all edges in a topology are necessary for the training objective, less topology information might make more sense. With the aid of edge evaluator, the local graphs are divided into causal and biased subgraphs. A dual-GNN architecture is used to encode the two subgraphs into corresponding representations. Thus, the causal representations are drawn closer to the training objective while the biased representations are pulled away from it. Further, the Hilbert-Schmidt Independence Criterion is employed to strengthen the separability of the two subgraphs. Extensive experiments on six real-world graph datasets are conducted to demonstrate the superiority of the proposed FedATH over the compared approaches.

IJCAI Conference 2025 Conference Paper

M^2LLM: Multi-view Molecular Representation Learning with Large Language Models

  • Jiaxin Ju
  • Yizhen Zheng
  • Huan Yee Koh
  • Can Wang
  • Shirui Pan

Accurate molecular property prediction is a critical challenge with wide-ranging applications in chemistry, materials science, and drug discovery. Molecular representation methods, including fingerprints and graph neural networks (GNNs), achieve state-of-the-art results by effectively deriving features from molecular structures. However, these methods often overlook decades of accumulated semantic and contextual knowledge. Recent advancements in large language models (LLMs) demonstrate remarkable reasoning abilities and prior knowledge across scientific domains, leading us to hypothesize that LLMs can generate rich molecular representations when guided to reason in multiple perspectives. To address these gaps, we propose M^2LLM, a multi-view framework that integrates three perspectives: the molecular structure view, the molecular task view, and the molecular rules view. These views are fused dynamically to adapt to task requirements, and experiments demonstrate that M^2LLM achieves state-of-the-art performance on multiple benchmarks across classification and regression tasks. Moreover, we demonstrate that representation derived from LLM achieves exceptional performance by leveraging two core functionalities: the generation of molecular embeddings through their encoding capabilities and the curation of molecular features through advanced reasoning processes.

ICML Conference 2025 Conference Paper

Mitigating Over-Squashing in Graph Neural Networks by Spectrum-Preserving Sparsification

  • Langzhang Liang
  • Fanchen Bu
  • Zixing Song
  • Zenglin Xu
  • Shirui Pan
  • Kijung Shin

The message-passing paradigm of Graph Neural Networks often struggles with exchanging information across distant nodes typically due to structural bottlenecks in certain graph regions, a limitation known as over-squashing. To reduce such bottlenecks, graph rewiring, which modifies graph topology, has been widely used. However, existing graph rewiring techniques often overlook the need to preserve critical properties of the original graph, e. g. , spectral properties. Moreover, many approaches rely on increasing edge count to improve connectivity, which introduces significant computational overhead and exacerbates the risk of over-smoothing. In this paper, we propose a novel graph-rewiring method that leverages spectral graph sparsification for mitigating over-squashing. Specifically, our method generates graphs with enhanced connectivity while maintaining sparsity and largely preserving the original graph spectrum, effectively balancing structural bottleneck reduction and graph property preservation. Experimental results validate the effectiveness of our approach, demonstrating its superiority over strong baseline methods in classification accuracy and retention of the Laplacian spectrum.

ICML Conference 2025 Conference Paper

N2GON: Neural Networks for Graph-of-Net with Position Awareness

  • Yejiang Wang
  • Yuhai Zhao
  • Zhengkui Wang
  • Wen Shan
  • Ling Li
  • Qian Li 0043
  • Miaomiao Huang
  • Meixia Wang

Graphs, fundamental in modeling various research subjects such as computing networks, consist of nodes linked by edges. However, they typically function as components within larger structures in real-world scenarios, such as in protein-protein interactions where each protein is a graph in a larger network. This study delves into the Graph-of-Net (GON), a structure that extends the concept of traditional graphs by representing each node as a graph itself. It provides a multi-level perspective on the relationships between objects, encapsulating both the detailed structure of individual nodes and the broader network of dependencies. To learn node representations within the GON, we propose a position-aware neural network for Graph-of-Net which processes both intra-graph and inter-graph connections and incorporates additional data like node labels. Our model employs dual encoders and graph constructors to build and refine a constraint network, where nodes are adaptively arranged based on their positions, as determined by the network’s constraint system. Our model demonstrates significant improvements over baselines in empirical evaluations on various datasets.

AAAI Conference 2025 Conference Paper

Open-Set Cross-Network Node Classification via Unknown-Excluded Adversarial Graph Domain Alignment

  • Xiao Shen
  • Zhihao Chen
  • Shirui Pan
  • Shuang Zhou
  • Laurence T. Yang
  • Xi Zhou

Existing cross-network node classification methods are mainly proposed for closed-set setting, where the source network and the target network share exactly the same label space. Such a setting is restricted in real-world applications, since the target network might contain additional classes that are not present in the source. In this work, we study a more realistic open-set cross-network node classification (O-CNNC) problem, where the target network contains all the known classes in the source and further contains several target-private classes unseen in the source. Borrowing the concept from open-set domain adaptation, all target-private classes are defined as an additional “unknown” class. To address the challenging O-CNNC problem, we propose an unknown-excluded adversarial graph domain alignment (UAGA) model with a separate-adapt training strategy. Firstly, UAGA roughly separates known classes from unknown class, by training a graph neural network encoder and a neighborhood-aggregation node classifier in an adversarial framework. Then, unknown-excluded adversarial domain alignment is customized to align only target nodes from known classes with the source, while pushing target nodes from unknown class far away from the source, by assigning positive and negative domain adaptation coefficient to known class nodes and unknown class nodes. Extensive experiments on real-world datasets demonstrate significant outperformance of the proposed UAGA over state-of-the-art methods on O-CNNC.

IJCAI Conference 2025 Conference Paper

Progressive Prefix-Memory Tuning for Complex Logical Query Answering on Knowledge Graphs

  • Xingrui Zhuo
  • Shirui Pan
  • Jiapu Wang
  • Gongqing Wu
  • Zan Zhang
  • Rui Li
  • Zizhong Wei
  • Xindong Wu

Conducting complex logical queries over knowledge graphs remains a significant challenge. Recent research has successfully leveraged Pre-trained Language Models (PLMs) to tackle Knowledge Graph Complex Query Answering (KGCQA) tasks, which is attributed to PLMs' ability to comprehend logical semantics of queries through context learning. However, existing PLM-based KGCQA methods usually overlook the harm of disordered syntax or fragmented contexts within a serialized query, posing the problem of “impossible language” to limit PLMs in grasping the logical semantics. To address this problem, we propose a Progressive Prefix-Memory Tuning (PPMT) framework for KGCQA tasks, which effectively rectifies erroneous segments in serialized queries to assist PLMs in query answering. First, we propose a prefix-memory rectification mechanism embedded in a PLM module. This mechanism assigns rectification parameters in memory stores to polish the language segments of entities, relations, and queries through specific prefixes. To further capture the logical semantics in queries, we design a progressive fine-tuning strategy, which optimizes our model through a conditional gradient update process guided by knowledge translation constraints. Extensive experiments on widely used KGCQA benchmarks demonstrate the significant superiority of PPMT in terms of HR@3 and MRR. Our codes are available at https: //github. com/lazyloafer/PPMT.

AAAI Conference 2025 Conference Paper

Robust Graph Based Social Recommendation Through Contrastive Multi-View Learning

  • Fei Xiong
  • Tao Zhang
  • Shirui Pan
  • Guixun Luo
  • Liang Wang

Social recommendation leverages the social connections between users to mitigate the issue of data sparsity and enhance recommendation quality. Although existing related works show their effectiveness, there remain two critical questions: i) The patterns of preference interactions among users are varied and heterogeneous. Current models struggle to accurately capture preference shifts from user interactions in noisy social environments. ii) Existing methods handle the integration of auxiliary information coarsely, potentially introducing noise and leading to biases in user preferences. To address the limitations above, we introduce a novel framework named Robust Graph Based Social Recommendation through Contrastive Multi-view Learning (RGCML). This framework leverages denoised social relations and global intents as dual auxiliary information sources to provide comprehensive characterization of users. Firstly, RGCML employs the concept of opinion dynamics to simulate how user preferences evolve due to noisy social relations. Then, it utilizes a specifically designed information fusion module to extract critical contextual information from multiple semantic perspectives, thereby achieving efficient personalized information fusion. Finally, it adopts the designed global-local contrastive learning paradigm that untangles and discriminates user preferences from global intents, further addressing the noise problem and enhancing the quality of user representations. Extensive experiments conducted on three real-world datasets demonstrate the superior performance of RGCML compared to several state-of-the-art (SOTA) baselines.

NeurIPS Conference 2025 Conference Paper

ShapeX: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models

  • Bosong Huang
  • Ming Jin
  • Yuxuan Liang
  • Johan Barthelemy
  • Debo Cheng
  • Qingsong Wen
  • Chenghao Liu
  • Shirui Pan

Explaining time series classification models is crucial, particularly in high-stakes applications such as healthcare and finance, where transparency and trust play a critical role. Although numerous time series classification methods have identified key subsequences, known as shapelets, as core features for achieving state-of-the-art performance and validating their pivotal role in classification outcomes, existing post-hoc time series explanation (PHTSE) methods primarily focus on timestep-level feature attribution. These explanation methods overlook the fundamental prior that classification outcomes are predominantly driven by key shapelets. To bridge this gap, we present ShapeX, an innovative framework that segments time series into meaningful shapelet-driven segments and employs Shapley values to assess their saliency. At the core of ShapeX lies the Shapelet Describe-and-Detect (SDD) framework, which effectively learns a diverse set of shapelets essential for classification. We further demonstrate that ShapeX produces explanations which reveal causal relationships instead of just correlations, owing to the atomicity properties of shapelets. Experimental results on both synthetic and real-world datasets demonstrate that ShapeX outperforms existing methods in identifying the most relevant subsequences, enhancing both the precision and causal fidelity of time series explanations.

IJCAI Conference 2025 Conference Paper

Sharpness-aware Zeroth-order Optimization for Graph Transformers

  • Yang Liu
  • Chuan Zhou
  • Yuhan Lin
  • Shuai Zhang
  • Yang Gao
  • Zhao Li
  • Shirui Pan

Graph Transformers (GTs) have emerged as powerful tools for handling graph-structured data through global attention mechanisms. While GTs can effectively capture long-range dependencies, they introduce difficulties in optimization due to their complex, non-differentiable operators, which cannot be directly handled by standard gradient-based optimizers (such as Adam or AdamW). To investigate the above issues, this work adopts the line of Zeroth-Order Optimization (ZOO) technique. However, direct integration of ZOO incurs considerable challenges due to the sharp loss landscape and steep gradients within the GT parameter space. Under the above observations, we propose a Sharpness-aware Zeroth-order Optimizer (SZO) that combines Sharpness-Aware Minimization (SAM) technique facilitating convergence within a flatter neighborhood, and leverages parallel computing for efficient gradient estimation. Theoretically, we provide a comprehensive analysis of the optimizer from both convergence and generalization perspectives. Empirically, we conduct extensive experiments on various classical GTs across a wide range of benchmark datasets, which underscore the superior performance of SZO over the state-of-the-art optimizers.

IJCAI Conference 2025 Conference Paper

T2S: High-resolution Time Series Generation with Text-to-Series Diffusion Models

  • Yunfeng Ge
  • Jiawei Li
  • Yiji Zhao
  • Haomin Wen
  • Zhao Li
  • Meikang Qiu
  • Hongyan Li
  • Ming Jin

Text-to-Time Series generation holds significant potential to address challenges such as data sparsity, imbalance, and limited availability of multimodal time series data across domains. While diffusion models have achieved remarkable success in Text-to-X (e. g. , vision and audio data) generation, their use in time series generation remains limit. Existing approaches face two critical limitations: (1) reliance on domain-specific captions that generalize poorly, and (2) inability to generate time series of arbitrary length, limiting real-world use. In this work, we first introduce a new multimodal dataset containing over 600, 000 high-resolution text-time series pairs. Second, we propose Text-to-Series (T2S), a diffusion-based framework that bridges the gap between natural language and time series in a domain-agnostic manner. It employs a length-adaptive VAE to encode time series of varying lengths into consistent latent embeddings. On top of that, T2S effectively aligns textual representations with latent embeddings by utilizing Flow Matching and employing DiT as the denoiser. We train T2S in an interleaved paradigm across multiple lengths, allowing it to generate sequences of arbitrary lengths. Extensive evaluations demonstrate that T2S achieves state-of-the-art performance across 13 datasets spanning 12 domains.

ICML Conference 2025 Conference Paper

Test-Time Graph Neural Dataset Search With Generative Projection

  • Xin Zheng 0008
  • Wei Huang 0034
  • Chuan Zhou 0001
  • Ming Li 0065
  • Shirui Pan

In this work, we address the test-time adaptation challenge in graph neural networks (GNNs), focusing on overcoming the limitations in flexibility and generalization inherent in existing data-centric approaches. To this end, we propose a novel research problem, test-time graph neural dataset search, which seeks to learn a parameterized test-time graph distribution to enhance the inference performance of unseen test graphs on well-trained GNNs. Specifically, we propose a generative Projection based test-time Graph Neural Dataset Search method, named PGNDS, which maps the unseen test graph distribution back to the known training distribution through a generation process guided by well-trained GNNs. The proposed PGNDS framework consists of three key modules: (1) dual conditional diffusion for GNN-guided generative projection through test-back-to-training distribution mapping; (2) dynamic search from the generative sampling space to select the most expressive test graphs; (3) ensemble inference to aggregate information from original and adapted test graphs. Extensive experiments on real-world graphs demonstrate the superior ability of our proposed PGNDS for improved test-time GNN inference.

AAAI Conference 2025 Conference Paper

TGLsta: Low-resource Textual Graph Learning with Semantic and Topological Awareness via LLMs

  • Qin Zhang
  • Xiaowei Li
  • Ziqi Liu
  • Xiaochen Fan
  • Xiaojun Chen
  • Shirui Pan

Textual Graphs (TGs) present a graph-based representation of textual data and find wide applications in real-world scenarios, such as citation networks, knowledge graphs, and social networks. While the traditional "pre-train, fine-tune" framework effectively addresses tasks requiring abundant labeled data, it falls short in scenarios with limited resource or zero-shot learning capabilities, particularly in low-resource textual graph node classification. Additionally, prevalent approaches that convert text nodes into shallow or manually engineered features fail to capture the rich semantic nuances within the text. The conventional methods often neglect the fusion of semantic and topological information, resulting in suboptimal model learning. To overcome these challenges, we proposed a novel method of low-resource textual graph node classification based on large language models, i.e., Textual graph learning with semantic and topological awareness (TGLsta), which comprehensively explores the semantic information, near neighborhood information, and the topology information in textual graphs, where these components are the most important information source contained in textual graphs. Graph prompt tuning for both zero- and few-shot textual graph node classification is further introduced.

IS Journal 2025 Journal Article

The Rise of Small Language Models

  • Qin Zhang
  • Ziqi Liu
  • Shirui Pan

Large language models (LLMs), such as GPT and LLAMA, exhibit exceptional comprehension and reasoning capabilities across a wide range of tasks, which are a result of the extensive corpora and the enormous number of parameters in a model. However, their size can pose significant challenges for deployment, particularly on resource-constrained devices. For issues that degrade the user experience, such as efficiency, latency, safety, and privacy, small language models (SLMs) offer a solution. This article begins by outlining the key principles behind SLMs and the reasons for their importance in the field. Subsequently, we discuss the methods used to develop SLMs and explore the collaboration between SLMs and LLMs. By exploring the pathways for harnessing the unique capabilities of SLMs and optimizing their integration with LLMs, it contributes to the ongoing discussion on their application and collaboration in natural language processing and offers insights for advancement and innovation in the field.

ICML Conference 2025 Conference Paper

TimeFilter: Patch-Specific Spatial-Temporal Graph Filtration for Time Series Forecasting

  • Yifan Hu 0006
  • Guibin Zhang
  • Peiyuan Liu
  • Disen Lan
  • Naiqi Li
  • Dawei Cheng
  • Tao Dai 0001
  • Shu-Tao Xia

Time series forecasting methods generally fall into two main categories: Channel Independent (CI) and Channel Dependent (CD) strategies. While CI overlooks important covariate relationships, CD captures all dependencies without distinction, introducing noise and reducing generalization. Recent advances in Channel Clustering (CC) aim to refine dependency modeling by grouping channels with similar characteristics and applying tailored modeling techniques. However, coarse-grained clustering struggles to capture complex, time-varying interactions effectively. To address these challenges, we propose TimeFilter, a GNN-based framework for adaptive and fine-grained dependency modeling. After constructing the graph from the input sequence, TimeFilter refines the learned spatial-temporal dependencies by filtering out irrelevant correlations while preserving the most critical ones in a patch-specific manner. Extensive experiments on 13 real-world datasets from diverse application domains demonstrate the state-of-the-art performance of TimeFilter. The code is available at https: //github. com/TROUBADOUR000/TimeFilter.

ICLR Conference 2025 Conference Paper

Towards Neural Scaling Laws for Time Series Foundation Models

  • Qingren Yao
  • Chao-Han Huck Yang
  • Renhe Jiang
  • Yuxuan Liang 0002
  • Ming Jin 0005
  • Shirui Pan

Scaling laws offer valuable insights into the design of time series foundation models (TSFMs). However, previous research has largely focused on the scaling laws of TSFMs for in-distribution (ID) data, leaving their out-of-distribution (OOD) scaling behavior and the influence of model architectures less explored. In this work, we examine two common TSFM architectures—encoder-only and decoder-only Transformers—and investigate their scaling behavior on both ID and OOD data. These models are trained and evaluated across varying parameter counts, compute budgets, and dataset sizes. Our experiments reveal that the log-likelihood loss of TSFMs exhibits similar scaling behavior in both OOD and ID settings. We further compare the scaling properties across different architectures, incorporating two state-of-the-art TSFMs as case studies, showing that model architecture plays a significant role in scaling. The encoder-only Transformers demonstrate better scalability than the decoder-only Transformers, while the architectural enhancements in the two advanced TSFMs primarily improve ID performance but reduce OOD scalability. While scaling up TSFMs is expected to drive performance breakthroughs, the lack of a comprehensive understanding of TSFM scaling laws has hindered the development of a robust framework to guide model scaling. We fill this gap in this work by synthesizing our findings and providing practical guidelines for designing and scaling larger TSFMs with enhanced model capabilities.

IS Journal 2025 Journal Article

Transforming Urban Dynamics: Harnessing Large Language Models for Smarter Mobility

  • Hao Xue
  • Ming Jin
  • Shirui Pan
  • Flora Salim

Artificial intelligence (AI) has the potential to analyze mobility data and make mobility systems smarter by leveraging diverse data sources such as geospatial data, transportation logs, and real-time sensor data to optimize traffic flow, enhance public transportation systems, and support the development of autonomous vehicles. With the newly emerged generative AI paradigm, exemplified by large language models (LLMs), there is great potential to transform the current AI applications in mobility, transportation, and urban domains. This article provides an overview of recent efforts and aims to shed light on the challenges and future opportunities to facilitate the adaptation of LLMs for smarter mobility systems.

ICLR Conference 2025 Conference Paper

Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark

  • Yili Wang 0004
  • Yixin Liu 0001
  • Xu Shen 0002
  • Chenyu Li
  • Rui Miao 0003
  • Kaize Ding
  • Ying Wang 0009
  • Shirui Pan

To build safe and reliable graph machine learning systems, unsupervised graph-level anomaly detection (GLAD) and unsupervised graph-level out-of-distribution (OOD) detection (GLOD) have received significant attention in recent years. Though these two lines of research share the same objective, they have been studied independently in the community due to distinct evaluation setups, creating a gap that hinders the application and evaluation of methods from one to the other. To bridge the gap, in this work, we present a Unified Benchmark for unsupervised Graph-level OOD and anomaly Detection (UB-GOLD), a comprehensive evaluation framework that unifies GLAD and GLOD under the concept of generalized graph-level OOD detection. Our benchmark encompasses 35 datasets spanning four practical anomaly and OOD detection scenarios, facilitating the comparison of 18 representative GLAD/GLOD methods. We conduct multi-dimensional analyses to explore the effectiveness, generalizability, robustness, and efficiency of existing methods, shedding light on their strengths and limitations. Furthermore, we provide an open-source codebase of UB-GOLD to foster reproducible research and outline potential directions for future investigations based on our insights.

NeurIPS Conference 2025 Conference Paper

Unsupervised Federated Graph Learning

  • Lele Fu
  • Tianchi Liao
  • Sheng Huang
  • Bowen Deng
  • Shirui Pan
  • Chuan Chen

Federated graph learning (FGL) is a privacy-preserving paradigm for modeling distributed graph data, designed to train a powerful global graph neural network. Existing FGL methods predominantly rely on label information during training, effective FGL in an unsupervised setting remains largely unexplored territory. In this paper, we address two key challenges in unsupervised FGL: 1) Local models tend to converge in divergent directions due to the lack of shared semantic information across clients. Then, how to align representation spaces among multiple clients is the first challenge. 2) Conventional federated weighted aggregation easily results in degrading the performance of the global model, then which raises another challenge, namely how to adaptively learn the global model parameters. In response to the two questions, we propose a tailored framework named FedPAM, which is composed of two modules: Representation Space Alignment (RSA) and Adaptive Global Parameter Learning (AGPL). RSA leverages a set of learnable anchors to define the global representation space, then local subgraphs are aligned with them through the fused Gromov-Wasserstein optimal transport, achieving the representation space alignment across clients. AGPL stacks local model parameters into third-order tensors, and adaptively integrates the global model parameters in a low-rank tensor space, which facilitates to fuse the high-order knowledge among clients. Extensive experiments on eight graph datasets are conducted, the results demonstrate that the proposed FedPAM is superior over classical and SOTA compared methods.

NeurIPS Conference 2024 Conference Paper

ARC: A Generalist Graph Anomaly Detector with In-Context Learning

  • Yixin Liu
  • Shiyuan Li
  • Yu Zheng
  • Qingfeng Chen
  • Chengqi Zhang
  • Shirui Pan

Graph anomaly detection (GAD), which aims to identify abnormal nodes that differ from the majority within a graph, has garnered significant attention. However, current GAD methods necessitate training specific to each dataset, resulting in high training costs, substantial data requirements, and limited generalizability when being applied to new datasets and domains. To address these limitations, this paper proposes ARC, a generalist GAD approach that enables a ``one-for-all'' GAD model to detect anomalies across various graph datasets on-the-fly. Equipped with in-context learning, ARC can directly extract dataset-specific patterns from the target dataset using few-shot normal samples at the inference stage, without the need for retraining or fine-tuning on the target dataset. ARC comprises three components that are well-crafted for capturing universal graph anomaly patterns: 1) smoothness-based feature A lignment module that unifies the features of different datasets into a common and anomaly-sensitive space; 2) ego-neighbor R esidual graph encoder that learns abnormality-related node embeddings; and 3) cross-attentive in- C ontext anomaly scoring module that predicts node abnormality by leveraging few-shot normal samples. Extensive experiments on multiple benchmark datasets from various domains demonstrate the superior anomaly detection performance, efficiency, and generalizability of ARC.

NeurIPS Conference 2024 Conference Paper

Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective

  • Jiaxi Hu
  • Yuehong Hu
  • Wei Chen
  • Ming Jin
  • Shirui Pan
  • Qingsong Wen
  • Yuxuan Liang

In long-term time series forecasting (LTSF) tasks, an increasing number of works have acknowledged that discrete time series originate from continuous dynamic systems and have attempted to model their underlying dynamics. Recognizing the chaotic nature of real-world data, our model, Attraos, incorporates chaos theory into LTSF, perceiving real-world time series as low-dimensional observations from unknown high-dimensional chaotic dynamical systems. Under the concept of attractor invariance, Attraos utilizes non-parametric Phase Space Reconstruction embedding along with a novel multi-resolution dynamic memory unit to memorize historical dynamical structures, and evolves by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.

AAAI Conference 2024 Conference Paper

Augmented Commonsense Knowledge for Remote Object Grounding

  • Bahram Mohammadi
  • Yicong Hong
  • Yuankai Qi
  • Qi Wu
  • Shirui Pan
  • Javen Qinfeng Shi

The vision-and-language navigation (VLN) task necessitates an agent to perceive the surroundings, follow natural language instructions, and act in photo-realistic unseen environments. Most of the existing methods employ the entire image or object features to represent navigable viewpoints. However, these representations are insufficient for proper action prediction, especially for the REVERIE task, which uses concise high-level instructions, such as “Bring me the blue cushion in the master bedroom”. To address enhancing representation, we propose an augmented commonsense knowledge model (ACK) to leverage commonsense information as a spatio-temporal knowledge graph for improving agent navigation. Specifically, the proposed approach involves constructing a knowledge base by retrieving commonsense information from ConceptNet, followed by a refinement module to remove noisy and irrelevant knowledge. We further present ACK which consists of knowledge graph-aware cross-modal and concept aggregation modules to enhance visual representation and visual-textual data alignment by integrating visible objects, commonsense knowledge, and concept history, which includes object and knowledge temporal information. Moreover, we add a new pipeline for the commonsense-based decision-making process which leads to more accurate local action prediction. Experimental results demonstrate our proposed model noticeably outperforms the baseline and archives the state-of-the-art on the REVERIE benchmark. The source code is available at https://github.com/Bahram-Mohammadi/ACK.

JBHI Journal 2024 Journal Article

Characterizing Secretion System Effector Proteins With Structure-Aware Graph Neural Networks and Pre-Trained Language Models

  • Zixu Ran
  • Cong Wang
  • Heyun Sun
  • Shirui Pan
  • Fuyi Li

The Type III Secretion Systems (T3SSs) play a pivotal role in host-pathogen interactions by mediating the secretion of type III secretion system effectors (T3SEs) into host cells. These T3SEs mimic host cell protein functions, influencing interactions between Gram-negative bacterial pathogens and their hosts. Identifying T3SEs is essential in biomedical research for comprehending bacterial pathogenesis and its implications on human cells. This study presents EDIFIER, a novel multi-channel model designed for accurate T3SE prediction. It incorporates a graph structural channel, utilizing graph convolutional networks (GCN) to capture protein 3D structural features and a sequence channel based on the ProteinBERT pre-trained model to extract the sequence context features of T3SEs. Rigorous benchmarking tests, including ablation studies and comparative analysis, validate that EDIFIER outperforms current state-of-the-art tools in T3SE prediction. To enhance EDIFIER's accessibility to the broader scientific community, we developed a webserver that is publicly accessible at http://edifier.unimelb-biotools.cloud.edu.au/.We anticipate EDIFIER will contribute to the field by providing reliable T3SE predictions, thereby advancing our understanding of host-pathogen dynamics.

IJCAI Conference 2024 Conference Paper

CONC: Complex-noise-resistant Open-set Node Classification with Adaptive Noise Detection

  • Qin Zhang
  • Jiexin Lu
  • Xiaowei Li
  • Huisi Wu
  • Shirui Pan
  • Junyang Chen

As a popular task in graph learning, node classification seeks to assign labels to nodes, taking into account both their features and connections. However, an important challenge for its application in real-world scenarios is the presence of newly-emerged out-of-distribution samples and noisy samples, which affect the quality and robustness of learned classifiers. Out-of-distribution (OOD) samples are often found in both the training and testing phases. Such samples don’t belong to any known categories. These OOD samples are considered as outliers (OOD noise) when they appear during training, and are recognized as open-set samples during the testing. Meanwhile, in-distribution (IND) noisy data, i. e. , known class samples with wrong labels, are also prevalent and inevitably degrade a model’s performance. The challenge of open-set learning with complex IND and OOD noise remains largely unexplored, particularly when dealing with non-IID graph data. To address these challenges, this paper introduces a novel complex-noise-resistant open-set node classification approach, designed for open-set graph data containing both IND and OOD noisy nodes. Specifically, a trustworthiness learner is adopted to learn the trustworthiness rates of the feature and label for each node while a decoder and an open-set classifier are trained to reconstruct the structure of a node and to predict its category simultaneously with the guidance of node trustworthiness. The experimental results demonstrate the superiority of our method.

ECAI Conference 2024 Conference Paper

Differentiating Choices via Commonality for Multiple-Choice Question Answering

  • Wenqing Deng
  • Zhe Wang 0001
  • Kewen Wang 0001
  • Shirui Pan
  • Xiaowang Zhang
  • Zhiyong Feng 0002

Multiple-choice question answering (MCQA) becomes particularly challenging when all choices are relevant to the question and are semantically similar. Yet this setting of MCQA can potentially provide valuable clues for choosing the right answer. Existing models often rank each choice separately, overlooking the context provided by other choices. Specifically, they fail to leverage the semantic commonalities and nuances among the choices for reasoning. In this paper, we propose a novel MCQA model by differentiating choices through identifying and eliminating their commonality, called DCQA. Our model captures token-level attention of each choice to the question, and separates tokens of the question attended to by all the choices (i. e. , commonalities) from those by individual choices (i. e. , nuances). Using the nuances as refined contexts for the choices, our model can effectively differentiate choices with subtle differences and provide justifications for choosing the correct answer. We conduct comprehensive experiments across five commonly used MCQA benchmarks, demonstrating that DCQA consistently outperforms baseline models. Furthermore, our case study illustrates the effectiveness of the approach in directing the attention of the model to more differentiating features.

NeurIPS Conference 2024 Conference Paper

EGonc : Energy-based Open-Set Node Classification with substitute Unknowns

  • Qin Zhang
  • Zelin Shi
  • Shirui Pan
  • Junyang Chen
  • Huisi Wu
  • Xiaojun Chen

Open-set Classification (OSC) is a critical requirement for safely deploying machine learning models in the open world, which aims to classify samples from known classes and reject samples from out-of-distribution (OOD). Existing methods exploit the feature space of trained network and attempt at estimating the uncertainty in the predictions. However, softmax-based neural networks are found to be overly confident in their predictions even on data they have never seen before andthe immense diversity of the OOD examples also makes such methods fragile. To this end, we follow the idea of estimating the underlying density of the training data to decide whether a given input is close to the in-distribution (IND) data and adopt Energy-based models (EBMs) as density estimators. A novel energy-based generative open-set node classification method, \textit{EGonc}, is proposed to achieve open-set graph learning. Specifically, we generate substitute unknowns to mimic the distribution of real open-set samples firstly, based on the information of graph structures. Then, an additional energy logit representing the virtual OOD class is learned from the residual of the feature against the principal space, and matched with the original logits by a constant scaling. This virtual logit serves as the indicator of OOD-ness. EGonc has nice theoretical properties that guarantee an overall distinguishable margin between the detection scores for IND and OOD samples. Comprehensive experimental evaluations of EGonc also demonstrate its superiority.

IJCAI Conference 2024 Conference Paper

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

  • Zhaopeng Peng
  • Xiaoliang Fan
  • Yufan Chen
  • Zheng Wang
  • Shirui Pan
  • Chenglu Wen
  • Ruisheng Zhang
  • Cheng Wang

Adapting Foundation Models (FMs) for down- stream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine- tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumula- tions of gradients. In this paper, we propose Feder- ated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise com- pression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations—layer- level and neuron-level—before and during FL fine- tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theo- retical guarantees. Experimental results on seven commonly used datasets (i. e. , four text and three vi- sion) demonstrate the superiority of FedPFT. Our code is available at https: //github. com/pzp-dzd/FedPFT.

AAAI Conference 2024 Conference Paper

GOODAT: Towards Test-Time Graph Out-of-Distribution Detection

  • Luzhi Wang
  • Dongxiao He
  • He Zhang
  • Yixin Liu
  • Wenjie Wang
  • Shirui Pan
  • Di Jin
  • Tat-Seng Chua

Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts (in distribution, ID), they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD samples with GNNs, recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN. Despite their effectiveness, these methods come with heavy training resources and costs, as they need to optimize the GNN-based models on training data. Moreover, their reliance on modifying the original GNNs and accessing training data further restricts their universality. To this end, this paper introduces a method to detect Graph Out-of-Distribution At Test-time (namely GOODAT), a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture. With a lightweight graph masker, GOODAT can learn informative subgraphs from test samples, enabling the capture of distinct graph patterns between OOD and ID samples. To optimize the graph masker, we meticulously design three unsupervised objective functions based on the graph information bottleneck principle, motivating the masker to capture compact yet informative subgraphs for OOD detection. Comprehensive evaluations confirm that our GOODAT method outperforms state-of-the-art benchmarks across a variety of real-world datasets.

IJCAI Conference 2024 Conference Paper

Gradformer: Graph Transformer with Exponential Decay

  • Chuang Liu
  • Zelin Yao
  • Yibing Zhan
  • Xueqi Ma
  • Shirui Pan
  • Wenbin Hu

Graph Transformers (GTs) have demonstrated their advantages across a wide range of tasks. However, the self-attention mechanism in GTs overlooks the graph's inductive biases, particularly biases related to structure, which are crucial for the graph tasks. Although some methods utilize positional encoding and attention bias to model inductive biases, their effectiveness is still suboptimal analytically. Therefore, this paper presents Gradformer, a method innovatively integrating GT with the intrinsic inductive bias by applying an exponential decay mask to the attention matrix. Specifically, the values in the decay mask matrix diminish exponentially, correlating with the decreasing node proximities within the graph structure. This design enables Gradformer to retain its ability to capture information from distant nodes while focusing on the graph's local details. Furthermore, Gradformer introduces a learnable constraint into the decay mask, allowing different attention heads to learn distinct decay masks. Such an design diversifies the attention heads, enabling a more effective assimilation of diverse structural information within the graph. Extensive experiments on various benchmarks demonstrate that Gradformer consistently outperforms the Graph Neural Network and GT baseline models in various graph classification and regression tasks. Additionally, Gradformer has proven to be an effective method for training deep GT models, maintaining or even enhancing accuracy compared to shallow models as the network deepens, in contrast to the significant accuracy drop observed in other GT models. Codes are available at https: //github. com/LiuChuang0059/Gradformer.

IJCAI Conference 2024 Conference Paper

Graph Attention Network with High-Order Neighbor Information Propagation for Social Recommendation

  • Fei Xiong
  • Haoran Sun
  • Guixun Luo
  • Shirui Pan
  • Meikang Qiu
  • Liang Wang

In recommender systems, graph neural networks (GNN) can integrate interactions between users and items with their attributes, which makes GNN-based methods more powerful. However, directly stacking multiple layers in a graph neural network can easily lead to over-smoothing, hence recommendation systems based on graph neural networks typically underutilize higher-order neighborhoods in their learning. Although some heterogeneous graph random walk methods based on meta-paths can achieve higher-order aggregation, the focus is predominantly on the nodes at the ends of the paths. Moreover, these methods require manually defined meta-paths, which limits the model’s expressiveness and flexibility. Furthermore, path encoding in graph neural networks usually focuses only on the sequence leading to the target node. However, real-world interactions often do not follow this strict sequence, limiting the predictive performance of sequence-based network models. These problems prevent GNN-based methods from being fully effective. We propose a Graph Attention network with Information Propagation path aggregation for Social Recommendation (GAIPSRec). Firstly, we propose a universal heterogeneous graph sampling framework that does not require manually defining meta-paths for path sampling, thereby offering greater flexibility. Moreover, our method takes into account all nodes on the aggregation path and is capable of learning information from higher-order neighbors without succumbing to over-smoothing. Finally, our method utilizes a gate mechanism to fuse sequential and non-sequential dependence in encoding path instances, allowing a more holistic view of the data. Extensive experiments on real-world datasets show that our proposed GAIPSRec improves the performance significantly and outperforms state-of-the-art methods.

ICML Conference 2024 Conference Paper

Graph Neural Stochastic Diffusion for Estimating Uncertainty in Node Classification

  • Xixun Lin
  • Wenxiao Zhang
  • Fengzhao Shi
  • Chuan Zhou 0001
  • Lixin Zou
  • Xiangyu Zhao 0001
  • Dawei Yin 0001
  • Shirui Pan

Graph neural networks (GNNs) have advanced the state of the art in various domains. Despite their remarkable success, the uncertainty estimation of GNN predictions remains under-explored, which limits their practical applications especially in risk-sensitive areas. Current works suffer from either intractable posteriors or inflexible prior specifications, leading to sub-optimal empirical results. In this paper, we present graph neural stochastic diffusion (GNSD), a novel framework for estimating predictive uncertainty on graphs by establishing theoretical connections between GNNs and stochastic partial differential equation. GNSD represents a GNN-based parameterization of the proposed graph stochastic diffusion equation which includes a $Q$-Wiener process to model the stochastic evolution of node representations. GNSD introduces a drift network to guarantee accurate prediction and a stochastic forcing network to model the propagation of epistemic uncertainty among nodes. Extensive experiments are conducted on multiple detection tasks, demonstrating that GNSD yields the superior performance over existing strong approaches.

IS Journal 2024 Journal Article

Integrating Graphs With Large Language Models: Methods and Prospects

  • Shirui Pan
  • Yizhen Zheng
  • Yixin Liu

Large language models (LLMs) such as Generative Pre-trained Transformer 4 have emerged as frontrunners, showcasing unparalleled prowess in diverse applications including answering queries, code generation, and more. Parallelly, graph-structured data, intrinsic data types, are pervasive in real-world scenarios. Merging the capabilities of LLMs with graph-structured data has been a topic of keen interest. This article bifurcates such integrations into two predominant categories. The first leverages LLMs for graph learning, where LLMs can not only augment existing graph algorithms but also stand as prediction models for various graph tasks. Conversely, the second category underscores the pivotal role of graphs in advancing LLMs. Mirroring human cognition, we solve complex tasks by adopting graphs in either reasoning or collaboration. Integrating with such structures can significantly boost the performance of LLMs in various complicated tasks. We also discuss and propose open questions for integrating LLMs with graph-structured data for the future direction of the field.

NeurIPS Conference 2024 Conference Paper

Large Language Models-guided Dynamic Adaptation for Temporal Knowledge Graph Reasoning

  • Jiapu Wang
  • Kai Sun
  • Linhao Luo
  • Wei Wei
  • Yongli Hu
  • Alan W. Liew
  • Shirui Pan
  • Baocai Yin

Temporal Knowledge Graph Reasoning (TKGR) is the process of utilizing temporal information to capture complex relations within a Temporal Knowledge Graph (TKG) to infer new knowledge. Conventional methods in TKGR typically depend on deep learning algorithms or temporal logical rules. However, deep learning-based TKGRs often lack interpretability, whereas rule-based TKGRs struggle to effectively learn temporal rules that capture temporal patterns. Recently, Large Language Models (LLMs) have demonstrated extensive knowledge and remarkable proficiency in temporal reasoning. Consequently, the employment of LLMs for Temporal Knowledge Graph Reasoning (TKGR) has sparked increasing interest among researchers. Nonetheless, LLMs are known to function as black boxes, making it challenging to comprehend their reasoning process. Additionally, due to the resource-intensive nature of fine-tuning, promptly updating LLMs to integrate evolving knowledge within TKGs for reasoning is impractical. To address these challenges, in this paper, we propose a Large Language Models-guided Dynamic Adaptation (LLM-DA) method for reasoning on TKGs. Specifically, LLM-DA harnesses the capabilities of LLMs to analyze historical data and extract temporal logical rules. These rules unveil temporal patterns and facilitate interpretable reasoning. To account for the evolving nature of TKGs, a dynamic adaptation strategy is proposed to update the LLM-generated rules with the latest events. This ensures that the extracted rules always incorporate the most recent knowledge and better generalize to the predictions on future events. Experimental results show that without the need of fine-tuning, LLM-DA significantly improves the accuracy of reasoning over several common datasets, providing a robust framework for TKGR tasks.

AAAI Conference 2024 Conference Paper

NestE: Modeling Nested Relational Structures for Knowledge Graph Reasoning

  • Bo Xiong
  • Mojtaba Nayyeri
  • Linhao Luo
  • Zihao Wang
  • Shirui Pan
  • Steffen Staab

Reasoning with knowledge graphs (KGs) has primarily focused on triple-shaped facts. Recent advancements have been explored to enhance the semantics of these facts by incorporating more potent representations, such as hyper-relational facts. However, these approaches are limited to atomic facts, which describe a single piece of information. This paper extends beyond atomic facts and delves into nested facts, represented by quoted triples where subjects and objects are triples themselves (e.g., ((BarackObama, holds_position, President), succeed_by, (DonaldTrump, holds_position, President))). These nested facts enable the expression of complex semantics like situations over time and logical patterns} over entities and relations. In response, we introduce NestE, a novel KG embedding approach that captures the semantics of both atomic and nested factual knowledge. NestE represents each atomic fact as a 1*3 matrix, and each nested relation is modeled as a 3*3 matrix that rotates the 1*3 atomic fact matrix through matrix multiplication. Each element of the matrix is represented as a complex number in the generalized 4D hypercomplex space, including (spherical) quaternions, hyperbolic quaternions, and split-quaternions. Through thorough analysis, we demonstrate the embedding's efficacy in capturing diverse logical patterns over nested facts, surpassing the confines of first-order logic-like expressions. Our experimental results showcase NestE's significant performance gains over current baselines in triple prediction and conditional link prediction. The code and pre-trained models are open available at https://github.com/xiongbo010/NestE.

ICLR Conference 2024 Conference Paper

Online GNN Evaluation Under Test-time Graph Distribution Shifts

  • Xin Zheng 0008
  • Dongjin Song
  • Qingsong Wen
  • Bo Du 0001
  • Shirui Pan

Evaluating the performance of a well-trained GNN model on real-world graphs is a pivotal step for reliable GNN online deployment and serving. Due to a lack of test node labels and unknown potential training-test graph data distribution shifts, conventional model evaluation encounters limitations in calculating performance metrics (e.g., test error) and measuring graph data-level discrepancies, particularly when the training graph used for developing GNNs remains unobserved during test time. In this paper, we study a new research problem, online GNN evaluation, which aims to provide valuable insights into the well-trained GNNs's ability to effectively generalize to real-world unlabeled graphs under the test-time graph distribution shifts. Concretely, we develop an effective learning behavior discrepancy score, dubbed LeBeD, to estimate the test-time generalization errors of well-trained GNN models. Through a novel GNN re-training strategy with a parameter-free optimality criterion, the proposed LeBeD comprehensively integrates learning behavior discrepancies from both node prediction and structure reconstruction perspectives. This enables the effective evaluation of the well-trained GNNs' ability to capture test node semantics and structural representations, making it an expressive metric for estimating the generalization error in online GNN evaluation. Extensive experiments on real-world test graphs under diverse graph distribution shifts could verify the effectiveness of the proposed method, revealing its strong correlation with ground-truth test errors on various well-trained GNN models.

JBHI Journal 2024 Journal Article

PLANNER: A Multi-Scale Deep Language Model for the Origins of Replication Site Prediction

  • Cong Wang
  • Zhijie He
  • Runchang Jia
  • Shirui Pan
  • Lachlan JM Coin
  • Jiangning Song
  • Fuyi Li

Origins of replication sites (ORIs) are crucial genomic regions where DNA replication initiation takes place, playing pivotal roles in fundamental biological processes like cell division, gene expression regulation, and DNA integrity. Accurate identification of ORIs is essential for comprehending cell replication, gene expression, and mutation-related diseases. However, experimental approaches for ORI identification are often expensive and time-consuming, leading to the growing popularity of computational methods. In this study, we present PLANNER (DeeP LeArNiNg prEdictor for ORI), a novel approach for species-specific and cell-specific prediction of eukaryotic ORIs. PLANNER uses the multi-scale k-tuple sequences as input and employs the DNABERT pre-training model with transfer learning and ensemble learning strategies to train accurate predictive models. Extensive empirical test results demonstrate that PLANNER achieved superior predictive performance compared to state-of-the-art approaches, including iOri-Euk, Stack-ORI, and ORI-Deep, within specific cell types and across different cell types. Furthermore, by incorporating an interpretable analysis mechanism, we provide insights into the learned patterns, facilitating the mapping from discovering important sequential determinants to comprehensively analysing their biological functions.

ICML Conference 2024 Conference Paper

Position: What Can Large Language Models Tell Us about Time Series Analysis

  • Ming Jin 0005
  • Yifan Zhang 0004
  • Wei Chen 0070
  • Kexin Zhang 0007
  • Yuxuan Liang 0002
  • Bin Yang 0002
  • Jindong Wang 0001
  • Shirui Pan

Time series analysis is essential for comprehending the complexities inherent in various real-world systems and applications. Although large language models (LLMs) have recently made significant strides, the development of artificial general intelligence (AGI) equipped with time series analysis capabilities remains in its nascent phase. Most existing time series models heavily rely on domain knowledge and extensive model tuning, predominantly focusing on prediction tasks. In this paper, we argue that current LLMs have the potential to revolutionize time series analysis, thereby promoting efficient decision-making and advancing towards a more universal form of time series analytical intelligence. Such advancement could unlock a wide range of possibilities, including time series modality switching and question answering. We encourage researchers and practitioners to recognize the potential of LLMs in advancing time series analysis and emphasize the need for trust in these related efforts. Furthermore, we detail the seamless integration of time series analysis with existing LLM technologies and outline promising avenues for future research.

ICLR Conference 2024 Conference Paper

Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning

  • Linhao Luo
  • Yuan-Fang Li
  • Gholamreza Haffari
  • Shirui Pan

Large language models (LLMs) have demonstrated impressive reasoning abilities in complex tasks. However, they lack up-to-date knowledge and experience hallucinations during reasoning, which can lead to incorrect reasoning processes and diminish their performance and trustworthiness. Knowledge graphs (KGs), which capture vast amounts of facts in a structured format, offer a reliable source of knowledge for reasoning. Nevertheless, existing KG-based LLM reasoning methods only treat KGs as factual knowledge bases and overlook the importance of their structural information for reasoning. In this paper, we propose a novel method called reasoning on graphs (RoG) that synergizes LLMs with KGs to enable faithful and interpretable reasoning. Specifically, we present a planning-retrieval-reasoning framework, where RoG first generates relation paths grounded by KGs as faithful plans. These plans are then used to retrieve valid reasoning paths from the KGs for LLMs to conduct faithful reasoning. Furthermore, RoG not only distills knowledge from KGs to improve the reasoning ability of LLMs through training but also allows seamless integration with any arbitrary LLMs during inference. Extensive experiments on two benchmark KGQA datasets demonstrate that RoG achieves state-of-the-art performance on KG reasoning tasks and generates faithful and interpretable reasoning results.

AAAI Conference 2024 Conference Paper

ROG_PL: Robust Open-Set Graph Learning via Region-Based Prototype Learning

  • Qin Zhang
  • Xiaowei Li
  • Jiexin Lu
  • Liping Qiu
  • Shirui Pan
  • Xiaojun Chen
  • Junyang Chen

Open-set graph learning is a practical task that aims to classify the known class nodes and to identify unknown class samples as unknowns. Conventional node classification methods usually perform unsatisfactorily in open-set scenarios due to the complex data they encounter, such as out-of-distribution (OOD) data and in-distribution (IND) noise. OOD data are samples that do not belong to any known classes. They are outliers if they occur in training (OOD noise), and open-set samples if they occur in testing. IND noise are training samples which are assigned incorrect labels. The existence of IND noise and OOD noise is prevalent, which usually cause the ambiguity problem, including the intra-class variety problem and the inter-class confusion problem. Thus, to explore robust open-set learning methods is necessary and difficult, and it becomes even more difficult for non-IID graph data. To this end, we propose a unified framework named ROG_PL to achieve robust open-set learning on complex noisy graph data, by introducing prototype learning. In specific, ROG_PL consists of two modules, i.e., denoising via label propagation and open-set prototype learning via regions. The first module corrects noisy labels through similarity-based label propagation and removes low-confidence samples, to solve the intra-class variety problem caused by noise. The second module learns open-set prototypes for each known class via non-overlapped regions and remains both interior and border prototypes to remedy the inter-class confusion problem. The two modules are iteratively updated under the constraints of classification loss and prototype diversity loss. To the best of our knowledge, the proposed ROG_PL is the first robust open-set node classification method for graph data with complex noise. Experimental evaluations of ROG_PL on several benchmark graph datasets demonstrate that it has good performance.

ICML Conference 2024 Conference Paper

Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs

  • Langzhang Liang
  • Sunwoo Kim 0006
  • Kijung Shin
  • Zenglin Xu
  • Shirui Pan
  • Yuan (Alan) Qi

Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations of SMP. In this work, we unveil the potential pitfalls of SMP and their remedies. We first identify two limitations of SMP: undesirable representation update for multi-hop neighbors and vulnerability against oversmoothing issues. To overcome these challenges, we propose a novel message-passing function called Multiset to Multiset GNN (M2M-GNN). Our theoretical analyses and extensive experiments demonstrate that M2M-GNN effectively alleviates the limitations of SMP, yielding superior performance in comparison.

ICLR Conference 2024 Conference Paper

Time-LLM: Time Series Forecasting by Reprogramming Large Language Models

  • Ming Jin 0005
  • Shiyu Wang 0001
  • Lintao Ma
  • Zhixuan Chu
  • James Y. Zhang
  • Xiaoming Shi 0001
  • Pin-Yu Chen
  • Yuxuan Liang 0002

Time series forecasting holds significant importance in many real-world dynamic systems and has been extensively studied. Unlike natural language process (NLP) and computer vision (CV), where a single large model can tackle multiple tasks, models for time series forecasting are often specialized, necessitating distinct designs for different tasks and applications. While pre-trained foundation models have made impressive strides in NLP and CV, their development in time series domains has been constrained by data sparsity. Recent studies have revealed that large language models (LLMs) possess robust pattern recognition and reasoning abilities over complex sequences of tokens. However, the challenge remains in effectively aligning the modalities of time series data and natural language to leverage these capabilities. In this work, we present Time-LLM, a reprogramming framework to repurpose LLMs for general time series forecasting with the backbone language models kept intact. We begin by reprogramming the input time series with text prototypes before feeding it into the frozen LLM to align the two modalities. To augment the LLM's ability to reason with time series data, we propose Prompt-as-Prefix (PaP), which enriches the input context and directs the transformation of reprogrammed input patches. The transformed time series patches from the LLM are finally projected to obtain the forecasts. Our comprehensive evaluations demonstrate that \method is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models. Moreover, Time-LLM excels in both few-shot and zero-shot learning scenarios. The code is made available at https://github.com/KimMeen/Time-LLM.

AAAI Conference 2024 Conference Paper

Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation

  • Di Mi
  • Yanjun Zhang
  • Leo Yu Zhang
  • Shengshan Hu
  • Qi Zhong
  • Haizhuan Yuan
  • Shirui Pan

Model extraction attacks (MEAs) enable an attacker to replicate the functionality of a victim deep neural network (DNN) model by only querying its API service remotely, posing a severe threat to the security and integrity of pay-per-query DNN-based services. Although the majority of current research on MEAs has primarily concentrated on neural classifiers, there is a growing prevalence of image-to-image translation (I2IT) tasks in our everyday activities. However, techniques developed for MEA of DNN classifiers cannot be directly transferred to the case of I2IT, rendering the vulnerability of I2IT models to MEA attacks often underestimated. This paper unveils the threat of MEA in I2IT tasks from a new perspective. Diverging from the traditional approach of bridging the distribution gap between attacker queries and victim training samples, we opt to mitigate the effect caused by the different distributions, known as the domain shift. This is achieved by introducing a new regularization term that penalizes high-frequency noise, and seeking a flatter minimum to avoid overfitting to the shifted distribution. Extensive experiments on different image translation tasks, including image super-resolution and style transfer, are performed on different backbone victim models, and the new design consistently outperforms the baseline by a large margin across all metrics. A few real-life I2IT APIs are also verified to be extremely vulnerable to our attack, emphasizing the need for enhanced defenses and potentially revised API publishing policies.

ICML Conference 2024 Conference Paper

Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

  • Guibin Zhang
  • Yanwei Yue
  • Kun Wang 0056
  • Junfeng Fang
  • Yongduo Sui
  • Kai Wang 0036
  • Yuxuan Liang 0002
  • Dawei Cheng

Graph Neural Networks (GNNs) excel in various graph learning tasks but face computational challenges when applied to large-scale graphs. A promising solution is to remove non-essential edges to reduce the computational overheads in GNN. Previous literature generally falls into two categories: topology-guided and semantic-guided. The former maintains certain graph topological properties yet often underperforms on GNNs. % due to low integration with neural network training. The latter performs well at lower sparsity on GNNs but faces performance collapse at higher sparsity levels. With this in mind, we propose a new research line and concept termed Graph Sparse Training (GST), which dynamically manipulates sparsity at the data level. Specifically, GST initially constructs a topology & semantic anchor at a low training cost, followed by performing dynamic sparse training to align the sparse graph with the anchor. We introduce the Equilibria Sparsification Principle to guide this process, balancing the preservation of both topological and semantic information. Ultimately, GST produces a sparse graph with maximum topological integrity and no performance degradation. Extensive experiments on 6 datasets and 5 backbones showcase that GST (I) identifies subgraphs at higher graph sparsity levels ($1. 67%\sim15. 85%$$\uparrow$) than state-of-the-art sparsification methods, (II) preserves more key spectral properties, (III) achieves $1. 27-3. 42\times$ speedup in GNN inference and (IV) successfully helps graph adversarial defense and graph lottery tickets.

NeurIPS Conference 2024 Conference Paper

Uncovering the Redundancy in Graph Self-supervised Learning Models

  • Zhibiao Wang
  • Xiao Wang
  • Haoyue Deng
  • Nian Liu
  • Shirui Pan
  • Chunming Hu

Graph self-supervised learning, as a powerful pre-training paradigm for Graph Neural Networks (GNNs) without labels, has received considerable attention. We have witnessed the success of graph self-supervised learning on pre-training the parameters of GNNs, leading many not to doubt that whether the learned GNNs parameters are all useful. In this paper, by presenting the experimental evidence and analysis, we surprisingly discover that the graph self-supervised learning models are highly redundant at both of neuron and layer levels, e. g. , even randomly removing 51. 6\% of parameters, the performance of graph self-supervised learning models still retains at least 96. 2\%. This discovery implies that the parameters of graph self-supervised models can be largely reduced, making simultaneously fine-tuning both graph self-supervised learning models and prediction layers more feasible. Therefore, we further design a novel graph pre-training and fine-tuning paradigm called SLImming DE-correlation Fine-tuning (SLIDE). The effectiveness of SLIDE is verified through extensive experiments on various benchmarks, and the performance can be even improved with fewer parameters of models in most cases. For example, in comparison with full fine-tuning GraphMAE on Amazon-Computers dataset, even randomly reducing 40\% of parameters, we can still achieve the improvement of 0. 24\% and 0. 27\% for Micro-F1 and Macro-F1 scores respectively.

IJCAI Conference 2024 Conference Paper

Unsupervised Deep Graph Structure and Embedding Learning

  • Xiaobo Shen
  • Lei Shi
  • Xiuwen Gong
  • Shirui Pan

Graph Neural Network (GNN) is powerful in graph embedding learning, but its performance has been shown to be heavily degraded under adversarial attacks. Deep graph structure learning (GSL) is proposed to defend attack by jointly learning graph structure and graph embedding, typically in node classification task. Label supervision is expensive in real-world applications, and thus unsupervised GSL is more challenging and still remains less studied. To fulfill this gap, this paper proposes a new unsupervised GSL method, i. e. , unsupervised property GNN (UPGNN). UPGNN first refines graph structure by exploring properties of low rank, sparsity, feature smoothness. UPGNN employs graph mutual information loss to learn graph embedding by maximizing its correlation with refined graph. The proposed UPGNN learns graph structure and embedding without label supervision, and thus can be applied various downstream tasks. We further propose Accelerated UPGNN (AUPGNN) to reduce computational complexity, providing a efficient alternative to UPGNN. Our extensive experiments on node classification and clustering demonstrate the effectiveness of the proposed method over the state-of-the-arts especially under heavy perturbation.

AAAI Conference 2023 Conference Paper

Beyond Smoothing: Unsupervised Graph Representation Learning with Edge Heterophily Discriminating

  • Yixin Liu
  • Yizhen Zheng
  • Daokun Zhang
  • Vincent CS Lee
  • Shirui Pan

Unsupervised graph representation learning (UGRL) has drawn increasing research attention and achieved promising results in several graph analytic tasks. Relying on the homophily assumption, existing UGRL methods tend to smooth the learned node representations along all edges, ignoring the existence of heterophilic edges that connect nodes with distinct attributes. As a result, current methods are hard to generalize to heterophilic graphs where dissimilar nodes are widely connected, and also vulnerable to adversarial attacks. To address this issue, we propose a novel unsupervised Graph Representation learning method with Edge hEterophily discriminaTing (GREET) which learns representations by discriminating and leveraging homophilic edges and heterophilic edges. To distinguish two types of edges, we build an edge discriminator that infers edge homophily/heterophily from feature and structure information. We train the edge discriminator in an unsupervised way through minimizing the crafted pivot-anchored ranking loss, with randomly sampled node pairs acting as pivots. Node representations are learned through contrasting the dual-channel encodings obtained from the discriminated homophilic and heterophilic edges. With an effective interplaying scheme, edge discriminating and representation learning can mutually boost each other during the training phase. We conducted extensive experiments on 14 benchmark datasets and multiple learning scenarios to demonstrate the superiority of GREET.

ICML Conference 2023 Conference Paper

Demystifying Uneven Vulnerability of Link Stealing Attacks against Graph Neural Networks

  • He Zhang 0012
  • Bang Wu 0004
  • Shuo Wang 0012
  • Xiangwen Yang
  • Jason Xue 0002
  • Shirui Pan
  • Xingliang Yuan

While graph neural networks (GNNs) dominate the state-of-the-art for exploring graphs in real-world applications, they have been shown to be vulnerable to a growing number of privacy attacks. For instance, link stealing is a well-known membership inference attack (MIA) on edges that infers the presence of an edge in a GNN’s training graph. Recent studies on independent and identically distributed data (e. g. , images) have empirically demonstrated that individuals from different groups suffer from different levels of privacy risks to MIAs, i. e. , uneven vulnerability. However, theoretical evidence of such uneven vulnerability is missing. In this paper, we first present theoretical evidence of the uneven vulnerability of GNNs to link stealing attacks, which lays the foundation for demystifying such uneven risks among different groups of edges. We further demonstrate a group-based attack paradigm to expose the practical privacy harm to GNN users derived from the uneven vulnerability of edges. Finally, we empirically validate the existence of obvious uneven vulnerability on nine real-world datasets (e. g. , about 25% AUC difference between different groups in the Credit graph). Compared with existing methods, the outperformance of our group-based attack paradigm confirms that customising different strategies for different groups results in more effective privacy attacks.

UAI Conference 2023 Conference Paper

Fast Heterogeneous Federated Learning with Hybrid Client Selection

  • Duanxiao Song
  • Guangyuan Shen
  • Dehong Gao
  • Libin Yang
  • Xukai Zhou
  • Shirui Pan
  • Wei Lou
  • Fang Zhou

Client selection schemes are widely adopted to handle the communication-efficient problems in recent studies of Federated Learning (FL). However, the large variance of the model updates aggregated from the randomly-selected unrepresentative subsets directly slows the FL convergence. We present a novel clustering-based client selection scheme to accelerate the FL convergence by variance reduction. Simple yet effective schemes are designed to improve the clustering effect and control the effect fluctuation, therefore, generating the client subset with certain representativeness of sampling. Theoretically, we demonstrate the improvement of the proposed scheme in variance reduction. We also present the tighter convergence guarantee of the proposed method thanks to the variance reduction. Experimental results confirm the exceed efficiency of our scheme compared to alternatives.

ICML Conference 2023 Conference Paper

Finding the Missing-half: Graph Complementary Learning for Homophily-prone and Heterophily-prone Graphs

  • Yizhen Zheng
  • He Zhang 0012
  • Vincent Cheng-Siong Lee
  • Yu Zheng 0013
  • Xiao Wang 0017
  • Shirui Pan

Real-world graphs generally have only one kind of tendency in their connections. These connections are either homophilic-prone or heterophily-prone. While graphs with homophily-prone edges tend to connect nodes with the same class (i. e. , intra-class nodes), heterophily-prone edges tend to build relationships between nodes with different classes (i. e. , inter-class nodes). Existing GNNs only take the original graph as input during training. The problem with this approach is that it forgets to take into consideration the ”missing-half” structural information, that is, heterophily-prone topology for homophily-prone graphs and homophily-prone topology for heterophily-prone graphs. In our paper, we introduce Graph cOmplementAry Learning, namely GOAL, which consists of two components: graph complementation and complemented graph convolution. The first component finds the missing-half structural information for a given graph to complement it. The complemented graph has two sets of graphs including both homophily- and heterophily-prone topology. In the latter component, to handle complemented graphs, we design a new graph convolution from the perspective of optimisation. The experiment results show that GOAL consistently outperforms all baselines in eight real-world datasets.

IJCAI Conference 2023 Conference Paper

G2Pxy: Generative Open-Set Node Classification on Graphs with Proxy Unknowns

  • Qin Zhang
  • Zelin Shi
  • Xiaolin Zhang
  • Xiaojun Chen
  • Philippe Fournier-Viger
  • Shirui Pan

Node classification is the task of predicting the labels of unlabeled nodes in a graph. State-of-the-art methods based on graph neural networks achieve excellent performance when all labels are available during training. But in real-life, models are of ten applied on data with new classes, which can lead to massive misclassification and thus significantly degrade performance. Hence, developing open-set classification methods is crucial to determine if a given sample belongs to a known class. Existing methods for open-set node classification generally use transductive learning with part or all of the features of real unseen class nodes to help with open-set classification. In this paper, we propose a novel generative open-set node classification method, i. e. , G2Pxy, which follows a stricter inductive learning setting where no information about unknown classes is available during training and validation. Two kinds of proxy unknown nodes, inter-class unknown proxies and external unknown proxies are generated via mixup to efficiently anticipate the distribution of novel classes. Using the generated proxies, a closed-set classifier can be transformed into an open-set one, by augmenting it with an extra proxy classifier. Under the constraints of both cross entropy loss and complement entropy loss, G2Pxy achieves superior effectiveness for unknown class detection and known class classification, which is validated by experiments on bench mark graph datasets. Moreover, G2Pxy does not have specific requirement on the GNN architecture and shows good generalizations.

NeurIPS Conference 2023 Conference Paper

GNNEvaluator: Evaluating GNN Performance On Unseen Graphs Without Labels

  • Xin Zheng
  • Miao Zhang
  • Chunyang Chen
  • Soheila Molaei
  • Chuan Zhou
  • Shirui Pan

Evaluating the performance of graph neural networks (GNNs) is an essential task for practical GNN model deployment and serving, as deployed GNNs face significant performance uncertainty when inferring on unseen and unlabeled test graphs, due to mismatched training-test graph distributions. In this paper, we study a new problem, GNN model evaluation, that aims to assess the performance of a specific GNN model trained on labeled and observed graphs, by precisely estimating its performance (e. g. , node classification accuracy) on unseen graphs without labels. Concretely, we propose a two-stage GNN model evaluation framework, including (1) DiscGraph set construction and (2) GNNEvaluator training and inference. The DiscGraph set captures wide-range and diverse graph data distribution discrepancies through a discrepancy measurement function, which exploits the GNN outputs of latent node embeddings and node class predictions. Under the effective training supervision from the DiscGraph set, GNNEvaluator learns to precisely estimate node classification accuracy of the to-be-evaluated GNN model and makes an accurate inference for evaluating GNN model performance. Extensive experiments on real-world unseen and unlabeled test graphs demonstrate the effectiveness of our proposed method for GNN model evaluation.

AAAI Conference 2023 Conference Paper

Neighbor Contrastive Learning on Learnable Graph Augmentation

  • Xiao Shen
  • Dewang Sun
  • Shirui Pan
  • Xi Zhou
  • Laurence T. Yang

Recent years, graph contrastive learning (GCL), which aims to learn representations from unlabeled graphs, has made great progress. However, the existing GCL methods mostly adopt human-designed graph augmentations, which are sensitive to various graph datasets. In addition, the contrastive losses originally developed in computer vision have been directly applied to graph data, where the neighboring nodes are regarded as negatives and consequently pushed far apart from the anchor. However, this is contradictory with the homophily assumption of net-works that connected nodes often belong to the same class and should be close to each other. In this work, we propose an end-to-end automatic GCL method, named NCLA to apply neighbor contrastive learning on learnable graph augmentation. Several graph augmented views with adaptive topology are automatically learned by the multi-head graph attention mechanism, which can be compatible with various graph datasets without prior domain knowledge. In addition, a neighbor contrastive loss is devised to allow multiple positives per anchor by taking network topology as the supervised signals. Both augmentations and embeddings are learned end-to-end in the proposed NCLA. Extensive experiments on the benchmark datasets demonstrate that NCLA yields the state-of-the-art node classification performance on self-supervised GCL and even exceeds the supervised ones, when the labels are extremely limited. Our code is released at https://github.com/shenxiaocam/NCLA.

AAAI Conference 2023 Conference Paper

Simple and Efficient Heterogeneous Graph Neural Network

  • Xiaocheng Yang
  • Mingyu Yan
  • Shirui Pan
  • Xiaochun Ye
  • Dongrui Fan

Heterogeneous graph neural networks (HGNNs) have the powerful capability to embed rich structural and semantic information of a heterogeneous graph into node representations. Existing HGNNs inherit many mechanisms from graph neural networks (GNNs) designed for homogeneous graphs, especially the attention mechanism and the multi-layer structure. These mechanisms bring excessive complexity, but seldom work studies whether they are really effective on heterogeneous graphs. In this paper, we conduct an in-depth and detailed study of these mechanisms and propose the Simple and Efficient Heterogeneous Graph Neural Network (SeHGNN). To easily capture structural information, SeHGNN pre-computes the neighbor aggregation using a light-weight mean aggregator, which reduces complexity by removing overused neighbor attention and avoiding repeated neighbor aggregation in every training epoch. To better utilize semantic information, SeHGNN adopts the single-layer structure with long metapaths to extend the receptive field, as well as a transformer-based semantic fusion module to fuse features from different metapaths. As a result, SeHGNN exhibits the characteristics of a simple network structure, high prediction accuracy, and fast training speed. Extensive experiments on five real-world heterogeneous graphs demonstrate the superiority of SeHGNN over the state-of-the-arts on both accuracy and training speed.

NeurIPS Conference 2023 Conference Paper

Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data

  • Xin Zheng
  • Miao Zhang
  • Chunyang Chen
  • Quoc Viet Hung Nguyen
  • Xingquan Zhu
  • Shirui Pan

Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediate benefits for various graph learning tasks. However, existing graph condensation methods rely on the joint optimization of nodes and structures in the condensed graph, and overlook critical issues in effectiveness and generalization ability. In this paper, we advocate a new Structure-Free Graph Condensation paradigm, named SFGC, to distill a large-scale graph into a small-scale graph node set without explicit graph structures, i. e. , graph-free data. Our idea is to implicitly encode topology structure information into the node attributes in the synthesized graph-free data, whose topology is reduced to an identity matrix. Specifically, SFGC contains two collaborative components: (1) a training trajectory meta-matching scheme for effectively synthesizing small-scale graph-free data; (2) a graph neural feature score metric for dynamically evaluating the quality of the condensed data. Through training trajectory meta-matching, SFGC aligns the long-term GNN learning behaviors between the large-scale graph and the condensed small-scale graph-free data, ensuring comprehensive and compact transfer of informative knowledge to the graph-free data. Afterward, the underlying condensed graph-free data would be dynamically evaluated with the graph neural feature score, which is a closed-form metric for ensuring the excellent expressiveness of the condensed graph-free data. Extensive experiments verify the superiority of SFGC across different condensation ratios.

NeurIPS Conference 2023 Conference Paper

Towards Self-Interpretable Graph-Level Anomaly Detection

  • Yixin Liu
  • Kaize Ding
  • Qinghua Lu
  • Fuyi Li
  • Leo Yu Zhang
  • Shirui Pan

Graph-level anomaly detection (GLAD) aims to identify graphs that exhibit notable dissimilarity compared to the majority in a collection. However, current works primarily focus on evaluating graph-level abnormality while failing to provide meaningful explanations for the predictions, which largely limits their reliability and application scope. In this paper, we investigate a new challenging problem, explainable GLAD, where the learning objective is to predict the abnormality of each graph sample with corresponding explanations, i. e. , the vital subgraph that leads to the predictions. To address this challenging problem, we propose a Self-Interpretable Graph aNomaly dETection model (SIGNET for short) that detects anomalous graphs as well as generates informative explanations simultaneously. Specifically, we first introduce the multi-view subgraph information bottleneck (MSIB) framework, serving as the design basis of our self-interpretable GLAD approach. This way SIGNET is able to not only measure the abnormality of each graph based on cross-view mutual information but also provide informative graph rationales by extracting bottleneck subgraphs from the input graph and its dual hypergraph in a self-supervised way. Extensive experiments on 16 datasets demonstrate the anomaly detection capability and self-interpretability of SIGNET.

IJCAI Conference 2022 Conference Paper

CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning

  • Di Jin
  • Luzhi Wang
  • Yizhen Zheng
  • Xiang Li
  • Fei Jiang
  • Wei Lin
  • Shirui Pan

Graph similarity learning refers to calculating the similarity score between two graphs, which is required in many realistic applications, such as visual tracking, graph classification, and collaborative filtering. As most of the existing graph neural networks yield effective graph representations of a single graph, little effort has been made for jointly learning two graph representations and calculating their similarity score. In addition, existing unsupervised graph similarity learning methods are mainly clustering-based, which ignores the valuable information embodied in graph pairs. To this end, we propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning in order to calculate the similarity between any two input graph objects. Specifically, we generate two augmented views for each graph in a pair respectively. Then, we employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning. The former is resorted to strengthen the consistency of node representations in two views. The latter is utilized to identify node differences between different graphs. Finally, we transform node representations into graph-level representations via pooling operations for graph similarity computation. We have evaluated CGMN on eight real-world datasets, and the experiment results show that the proposed new approach is superior to the state-of-the-art methods in graph similarity learning downstream tasks.

AAAI Conference 2022 Conference Paper

Exploring Relational Semantics for Inductive Knowledge Graph Completion

  • Changjian Wang
  • Xiaofei Zhou
  • Shirui Pan
  • Linhua Dong
  • Zeliang Song
  • Ying Sha

Knowledge graph completion (KGC) aims to infer missing information in incomplete knowledge graphs (KGs). Most previous works only consider the transductive scenario where entities are existing in KGs, which cannot work effectively for the inductive scenario containing emerging entities. Recently some graph neural network-based methods have been proposed for inductive KGC by aggregating neighborhood information to capture some uncertainty semantics from the neighboring auxiliary triples. But these methods ignore the more general relational semantics underlying all the known triples that can provide richer information to represent emerging entities so as to satisfy the inductive scenario. In this paper, we propose a novel model called CFAG, which utilizes two granularity levels of relational semantics in a coarsegrained aggregator (CG-AGG) and a fine-grained generative adversarial net (FG-GAN), for inductive KGC. The CG-AGG firstly generates entity representations with multiple semantics through a hypergraph neural network-based global aggregator and a graph neural network-based local aggregator, and the FG-GAN further enhances entity representations with specific semantics through conditional generative adversarial nets. Experimental results on benchmark datasets show that our model outperforms state-of-the-art models for inductive KGC.

IJCAI Conference 2022 Conference Paper

Multi-Graph Fusion Networks for Urban Region Embedding

  • Shangbin Wu
  • Xu Yan
  • Xiaoliang Fan
  • Shirui Pan
  • Shichao Zhu
  • Chuanpan Zheng
  • Ming Cheng
  • Cheng Wang

Learning the embeddings for urban regions from human mobility data can reveal the functionality of regions, and then enables the correlated but distinct tasks such as crime prediction. Human mobility data contains rich but abundant information, which yields to the comprehensive region embeddings for cross domain tasks. In this paper, we propose multi-graph fusion networks (MGFN) to enable the cross domain prediction tasks. First, we integrate the graphs with spatio-temporal similarity as mobility patterns through a mobility graph fusion module. Then, in the mobility pattern joint learning module, we design the multi-level cross-attention mechanism to learn the comprehensive embeddings from multiple mobility patterns based on intra-pattern and inter-pattern messages. Finally, we conduct extensive experiments on real-world urban datasets. Experimental results demonstrate that the proposed MGFN outperforms the state-of-the-art methods by up to 12. 35% improvement. https: //github. com/wushangbin/MGFN

NeurIPS Conference 2022 Conference Paper

Neural Temporal Walks: Motif-Aware Representation Learning on Continuous-Time Dynamic Graphs

  • Ming Jin
  • Yuan-Fang Li
  • Shirui Pan

Continuous-time dynamic graphs naturally abstract many real-world systems, such as social and transactional networks. While the research on continuous-time dynamic graph representation learning has made significant advances recently, neither graph topological properties nor temporal dependencies have been well-considered and explicitly modeled in capturing dynamic patterns. In this paper, we introduce a new approach, Neural Temporal Walks (NeurTWs), for representation learning on continuous-time dynamic graphs. By considering not only time constraints but also structural and tree traversal properties, our method conducts spatiotemporal-biased random walks to retrieve a set of representative motifs, enabling temporal nodes to be characterized effectively. With a component based on neural ordinary differential equations, the extracted motifs allow for irregularly-sampled temporal nodes to be embedded explicitly over multiple different interaction time intervals, enabling the effective capture of the underlying spatiotemporal dynamics. To enrich supervision signals, we further design a harder contrastive pretext task for model optimization. Our method demonstrates overwhelming superiority under both transductive and inductive settings on six real-world datasets.

NeurIPS Conference 2022 Conference Paper

Pseudo-Riemannian Graph Convolutional Networks

  • Bo Xiong
  • Shichao Zhu
  • Nico Potyka
  • Shirui Pan
  • Chuan Zhou
  • Steffen Staab

Graph Convolutional Networks (GCNs) are powerful frameworks for learning embeddings of graph-structured data. GCNs are traditionally studied through the lens of Euclidean geometry. Recent works find that non-Euclidean Riemannian manifolds provide specific inductive biases for embedding hierarchical or spherical data. However, they cannot align well with data of mixed graph topologies. We consider a larger class of pseudo-Riemannian manifolds that generalize hyperboloid and sphere. We develop new geodesic tools that allow for extending neural network operations into geodesically disconnected pseudo-Riemannian manifolds. As a consequence, we derive a pseudo-Riemannian GCN that models data in pseudo-Riemannian manifolds of constant nonzero curvature in the context of graph neural networks. Our method provides a geometric inductive bias that is sufficiently flexible to model mixed heterogeneous topologies like hierarchical graphs with cycles. We demonstrate the representational capabilities of this method by applying it to the tasks of graph reconstruction, node classification, and link prediction on a series of standard graphs with mixed topologies. Empirical results demonstrate that our method outperforms Riemannian counterparts when embedding graphs of complex topologies.

NeurIPS Conference 2022 Conference Paper

Rethinking and Scaling Up Graph Contrastive Learning: An Extremely Efficient Approach with Group Discrimination

  • Yizhen Zheng
  • Shirui Pan
  • Vincent CS Lee
  • Yu Zheng
  • Philip S Yu

Graph contrastive learning (GCL) alleviates the heavy reliance on label information for graph representation learning (GRL) via self-supervised learning schemes. The core idea is to learn by maximising mutual information for similar instances, which requires similarity computation between two node instances. However, GCL is inefficient in both time and memory consumption. In addition, GCL normally requires a large number of training epochs to be well-trained on large-scale datasets. Inspired by an observation of a technical defect (i. e. , inappropriate usage of Sigmoid function) commonly used in two representative GCL works, DGI and MVGRL, we revisit GCL and introduce a new learning paradigm for self-supervised graph representation learning, namely, Group Discrimination (GD), and propose a novel GD-based method called Graph Group Discrimination (GGD). Instead of similarity computation, GGD directly discriminates two groups of node samples with a very simple binary cross-entropy loss. In addition, GGD requires much fewer training epochs to obtain competitive performance compared with GCL methods on large-scale datasets. These two advantages endow GGD with very efficient property. Extensive experiments show that GGD outperforms state-of-the-art self-supervised methods on eight datasets. In particular, GGD can be trained in 0. 18 seconds (6. 44 seconds including data preprocessing) on ogbn-arxiv, which is orders of magnitude (10, 000+) faster than GCL baselines while consuming much less memory. Trained with 9 hours on ogbn-papers100M with billion edges, GGD outperforms its GCL counterparts in both accuracy and efficiency.

IJCAI Conference 2022 Conference Paper

Survey on Graph Neural Network Acceleration: An Algorithmic Perspective

  • Xin Liu
  • Mingyu Yan
  • Lei Deng
  • Guoqi Li
  • Xiaochun Ye
  • Dongrui Fan
  • Shirui Pan
  • Yuan Xie

Graph neural networks (GNNs) have been a hot spot of recent research and are widely utilized in diverse applications. However, with the use of huger data and deeper models, an urgent demand is unsurprisingly made to accelerate GNNs for more efficient execution. In this paper, we provide a comprehensive survey on acceleration methods for GNNs from an algorithmic perspective. We first present a new taxonomy to classify existing acceleration methods into five categories. Based on the classification, we systematically discuss these methods and highlight their correlations. Next, we provide comparisons from aspects of the efficiency and characteristics of these methods. Finally, we suggest some promising prospects for future research.

AAAI Conference 2022 Short Paper

Thrifty Neural Architecture Search for Medical Image Segmentation (Student Abstract)

  • Ruibin Chen
  • Miao Zhang
  • Xin Zheng
  • Shirui Pan

Convolutional neural network (CNN) based image segmentation has been widely used in analyzing medical images and benefited many real-world disease diagnosis applications. However, existing advanced CNN-based medical image segmentation models usually contain numerous parameters that require massive computation and memory, limiting the applicability of these models in the data-constrained or hardwareconstrained environments. By leveraging the recently proposed neural architecture search (NAS), this paper presents a novel approach, dubbed Thrifty NAS, to design computation and memory-efficient models for medical image segmentation automatically. The searched models by Thrifty NAS are with much fewer parameters while retaining competitive performance. More specifically, we design a micro level space for cell structure search and a macro level cell path for better network structure modeling. Extensive experimental results in different medical image datasets verify the effectiveness of the proposed method with competitive segmentation performance, especially with minuscule neural architecture model size, i. e. , 0. 61M that is superior to U-Net (7. 76 M) and UNet++ (9. 04 M).

IJCAI Conference 2022 Conference Paper

Triformer: Triangular, Variable-Specific Attentions for Long Sequence Multivariate Time Series Forecasting

  • Razvan-Gabriel Cirstea
  • Chenjuan Guo
  • Bin Yang
  • Tung Kieu
  • Xuanyi Dong
  • Shirui Pan

A variety of real-world applications rely on far future information to make decisions, thus calling for efficient and accurate long sequence multivariate time series forecasting. While recent attention-based forecasting models show strong abilities in capturing long-term dependencies, they still suffer from two key limitations. First, canonical self attention has a quadratic complexity w. r. t. the input time series length, thus falling short in efficiency. Second, different variables’ time series often have distinct temporal dynamics, which existing studies fail to capture, as they use the same model parameter space, e. g. , projection matrices, for all variables’ time series, thus falling short in accuracy. To ensure high efficiency and accuracy, we propose Triformer, a triangular, variable-specific attention. (i) Linear complexity: we introduce a novel patch attention with linear complexity. When stacking multiple layers of the patch attentions, a triangular structure is proposed such that the layer sizes shrink exponentially, thus maintaining linear complexity. (ii) Variable-specific parameters: we propose a light-weight method to enable distinct sets of model parameters for different variables’ time series to enhance accuracy without compromising efficiency and memory usage. Strong empirical evidence on four datasets from multiple domains justifies our design choices, and it demonstrates that Triformer outperforms state-of-the-art methods w. r. t. both accuracy and efficiency. Source code is publicly available at https: //github. com/razvanc92/triformer.

AAAI Conference 2021 Conference Paper

Contrastive and Generative Graph Convolutional Networks for Graph-based Semi-Supervised Learning

  • Sheng Wan
  • Shirui Pan
  • Jian Yang
  • Chen Gong

Graph-based Semi-Supervised Learning (SSL) aims to transfer the labels of a handful of labeled data to the remaining massive unlabeled data via a graph. As one of the most popular graph-based SSL approaches, the recently proposed Graph Convolutional Networks (GCNs) have gained remarkable progress by combining the sound expressiveness of neural networks with graph structure. Nevertheless, the existing graph-based methods do not directly address the core problem of SSL, i. e. , the shortage of supervision, and thus their performances are still very limited. To accommodate this issue, a novel GCN-based SSL algorithm is presented in this paper to enrich the supervision signals by utilizing both data similarities and graph structure. Firstly, by designing a semisupervised contrastive loss, improved node representations can be generated via maximizing the agreement between different views of the same data or the data from the same class. Therefore, the rich unlabeled data and the scarce yet valuable labeled data can jointly provide abundant supervision information for learning discriminative node representations, which helps improve the subsequent classification result. Secondly, the underlying determinative relationship between the data features and input graph topology is extracted as supplementary supervision signals for SSL via using a graph generative loss related to the input features. Intensive experimental results on a variety of real-world datasets firmly verify the effectiveness of our algorithm compared with other state-ofthe-art methods.

NeurIPS Conference 2021 Conference Paper

Contrastive Graph Poisson Networks: Semi-Supervised Learning with Extremely Limited Labels

  • Sheng Wan
  • Yibing Zhan
  • Liu Liu
  • Baosheng Yu
  • Shirui Pan
  • Chen Gong

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of semi-supervised node classification. However, most existing GNN models require sufficient labeled data for effective network training. Their performance can be seriously degraded when labels are extremely limited. To address this issue, we propose a new framework termed Contrastive Graph Poisson Networks (CGPN) for node classification under extremely limited labeled data. Specifically, our CGPN derives from variational inference; integrates a newly designed Graph Poisson Network (GPN) to effectively propagate the limited labels to the entire graph and a normal GNN, such as Graph Attention Network, that flexibly guides the propagation of GPN; applies a contrastive objective to further exploit the supervision information from the learning process of GPN and GNN models. Essentially, our CGPN can enhance the learning performance of GNNs under extremely limited labels by contrastively propagating the limited labels to the entire graph. We conducted extensive experiments on different types of datasets to demonstrate the superiority of CGPN.

ICML Conference 2021 Conference Paper

iDARTS: Differentiable Architecture Search with Stochastic Implicit Gradients

  • Miao Zhang 0022
  • Steven W. Su
  • Shirui Pan
  • Xiaojun Chang
  • Ehsan Abbasnejad
  • Gholamreza Haffari

Differentiable ARchiTecture Search(DARTS) has recently become the mainstream in the neural architecture search (NAS) due to its efficiency and simplicity. With a gradient-based bi-level optimization, DARTS alternately optimizes the inner model weights and the outer architecture parameter in a weight-sharing supernet. A key challenge to the scalability and quality of the learned architectures is the need for differentiating through the inner-loop optimisation. While much has been discussed about several potentially fatal factors in DARTS, the architecture gradient, a. k. a. hypergradient, has received less attention. In this paper, we tackle the hypergradient computation in DARTS based on the implicit function theorem, making it only depends on the obtained solution to the inner-loop optimization and agnostic to the optimization path. To further reduce the computational requirements, we formulate a stochastic hypergradient approximation for differentiable NAS, and theoretically show that the architecture optimization with the proposed method is expected to converge to a stationary point. Comprehensive experiments on two NAS benchmark search spaces and the common NAS search space verify the effectiveness of our proposed method. It leads to architectures outperforming, with large margins, those learned by the baseline methods.

IJCAI Conference 2021 Conference Paper

Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning

  • Ming Jin
  • Yizhen Zheng
  • Yuan-Fang Li
  • Chen Gong
  • Chuan Zhou
  • Shirui Pan

Graph representation learning plays a vital role in processing graph-structured data. However, prior arts on graph representation learning heavily rely on labeling information. To overcome this problem, inspired by the recent success of graph contrastive learning and Siamese networks in visual representation learning, we propose a novel self-supervised approach in this paper to learn node representations by enhancing Siamese self-distillation with multi-scale contrastive learning. Specifically, we first generate two augmented views from the input graph based on local and global perspectives. Then, we employ two objectives called cross-view and cross-network contrastiveness to maximize the agreement between node representations across different views and networks. To demonstrate the effectiveness of our approach, we perform empirical experiments on five real-world datasets. Our method not only achieves new state-of-the-art results but also surpasses some semi-supervised counterparts by large margins. Code is made available at https: //github. com/GRAND-Lab/MERIT

AAAI Conference 2021 Short Paper

Towards Extracting Graph Neural Network Models via Prediction Queries (Student Abstract)

  • Bang Wu
  • Shirui Pan
  • Xingliang Yuan

Graph data has been widely used to represent data from various domain, e. g. , social networks, recommendation system. With great power, the GNN models, usually as valuable properties of their owners, also become attractive targets of the adversary who covets to steal them. While existing works show that simple deep neural networks can be reproduced by so-called Model Extraction Attacks, how to extract a GNN model has not been explored. In this paper, we exploit the threat of model extraction attacks against GNN models. Unlike ordinary attacks which obtain model information via only the input-output query pairs, we utilize both the node queries and the graph structure to extract the GNNs. Furthermore, we consider the stealthiness of the attack and propose to generate legitimate queries so the extraction can be applied discreetly. We implement our attack by leveraging the responses of these queries, as well as other accessible knowledge, e. g. , neighbor connectives of the queried nodes. By evaluating over three real-world datasets, our attack is shown to effectively produce a surrogate model with more than 80% equivalent predictions as the target model.

IJCAI Conference 2020 Conference Paper

A Relation-Specific Attention Network for Joint Entity and Relation Extraction

  • Yue Yuan
  • Xiaofei Zhou
  • Shirui Pan
  • Qiannan Zhu
  • Zeliang Song
  • Li Guo

Joint extraction of entities and relations is an important task in natural language processing (NLP), which aims to capture all relational triplets from plain texts. This is a big challenge due to some of the triplets extracted from one sentence may have overlapping entities. Most existing methods perform entity recognition followed by relation detection between every possible entity pairs, which usually suffers from numerous redundant operations. In this paper, we propose a relation-specific attention network (RSAN) to handle the issue. Our RSAN utilizes relation-aware attention mechanism to construct specific sentence representations for each relation, and then performs sequence labeling to extract its corresponding head and tail entities. Experiments on two public datasets show that our model can effectively extract overlapping triplets and achieve state-of-the-art performance.

NeurIPS Conference 2020 Conference Paper

Differentiable Neural Architecture Search in Equivalent Space with Exploration Enhancement

  • Miao Zhang
  • Huiqi Li
  • Shirui Pan
  • Xiaojun Chang
  • Zongyuan Ge
  • Steven Su

Recent works on One-Shot Neural Architecture Search (NAS) mostly adopt a bilevel optimization scheme to alternatively optimize the supernet weights and architecture parameters after relaxing the discrete search space into a differentiable space. However, the non-negligible incongruence in their relaxation methods is hard to guarantee the differentiable optimization in the continuous space is equivalent to the optimization in the discrete space. Differently, this paper utilizes a variational graph autoencoder to injectively transform the discrete architecture space into an equivalently continuous latent space, to resolve the incongruence. A probabilistic exploration enhancement method is accordingly devised to encourage intelligent exploration during the architecture search in the latent space, to avoid local optimal in architecture search. As the catastrophic forgetting in differentiable One-Shot NAS deteriorates supernet predictive ability and makes the bilevel optimization inefficient, this paper further proposes an architecture complementation method to relieve this deficiency. We analyze the effectiveness of the proposed method, and a series of experiments have been conducted to compare the proposed method with state-of-the-art One-Shot NAS methods.

AAAI Conference 2020 Conference Paper

Going Deep: Graph Convolutional Ladder-Shape Networks

  • Ruiqi Hu
  • Shirui Pan
  • Guodong Long
  • Qinghua Lu
  • Liming Zhu
  • Jing Jiang

Neighborhood aggregation algorithms like spectral graph convolutional networks (GCNs) formulate graph convolutions as a symmetric Laplacian smoothing operation to aggregate the feature information of one node with that of its neighbors. While they have achieved great success in semisupervised node classification on graphs, current approaches suffer from the over-smoothing problem when the depth of the neural networks increases, which always leads to a noticeable degradation of performance. To solve this problem, we present graph convolutional ladder-shape networks (GCLN), a novel graph neural network architecture that transmits messages from shallow layers to deeper layers to overcome the over-smoothing problem and dramatically extend the scale of the neural networks with improved performance. We have validated the effectiveness of proposed GCLN at a node-wise level with a semi-supervised task (node classification) and an unsupervised task (node clustering), and at a graph-wise level with graph classification by applying a differentiable pooling operation. The proposed GCLN outperforms original GCNs, deep GCNs and other state-of-the-art GCN-based models for all three tasks, which were designed from various perspectives on six real-world benchmark data sets.

NeurIPS Conference 2020 Conference Paper

Graph Geometry Interaction Learning

  • Shichao Zhu
  • Shirui Pan
  • Chuan Zhou
  • Jia Wu
  • Yanan Cao
  • Bin Wang

While numerous approaches have been developed to embed graphs into either Euclidean or hyperbolic spaces, they do not fully utilize the information available in graphs, or lack the flexibility to model intrinsic complex graph geometry. To utilize the strength of both Euclidean and hyperbolic geometries, we develop a novel Geometry Interaction Learning (GIL) method for graphs, a well-suited and efficient alternative for learning abundant geometric properties in graph. GIL captures a more informative internal structural features with low dimensions while maintaining conformal invariance of each space. Furthermore, our method endows each node the freedom to determine the importance of each geometry space via a flexible dual feature interaction learning and probability assembling mechanism. Promising experimental results are presented for five benchmark datasets on node classification and link prediction tasks.

NeurIPS Conference 2020 Conference Paper

Graph Stochastic Neural Networks for Semi-supervised Learning

  • Haibo Wang
  • Chuan Zhou
  • Xin Chen
  • Jia Wu
  • Shirui Pan
  • Jilong Wang

Graph Neural Networks (GNNs) have achieved remarkable performance in the task of the semi-supervised node classification. However, most existing models learn a deterministic classification function, which lack sufficient flexibility to explore better choices in the presence of kinds of imperfect observed data such as the scarce labeled nodes and noisy graph structure. To improve the rigidness and inflexibility of deterministic classification functions, this paper proposes a novel framework named Graph Stochastic Neural Networks (GSNN), which aims to model the uncertainty of the classification function by simultaneously learning a family of functions, i. e. , a stochastic function. Specifically, we introduce a learnable graph neural network coupled with a high-dimensional latent variable to model the distribution of the classification function, and further adopt the amortised variational inference to approximate the intractable joint posterior for missing labels and the latent variable. By maximizing the lower-bound of the likelihood for observed node labels, the instantiated models can be trained in an end-to-end manner effectively. Extensive experiments on three real-world datasets show that GSNN achieves substantial performance gain in different scenarios compared with stat-of-the-art baselines.

AAAI Conference 2020 Conference Paper

GSSNN: Graph Smoothing Splines Neural Networks

  • Shichao Zhu
  • Lewei Zhou
  • Shirui Pan
  • Chuan Zhou
  • Guiying Yan
  • Bin Wang

Graph Neural Networks (GNNs) have achieved state-of-theart performance in many graph data analysis tasks. However, they still suffer from two limitations for graph representation learning. First, they exploit non-smoothing node features which may result in suboptimal embedding and degenerated performance for graph classification. Second, they only exploit neighbor information but ignore global topological knowledge. Aiming to overcome these limitations simultaneously, in this paper, we propose a novel, flexible, and endto-end framework, Graph Smoothing Splines Neural Networks (GSSNN), for graph classification. By exploiting the smoothing splines, which are widely used to learn smoothing fitting function in regression, we develop an effective feature smoothing and enhancement module Scaled Smoothing Splines (S3 ) to learn graph embedding. To integrate global topological information, we design a novel scoring module, which exploits closeness, degree, as well as self-attention values, to select important node features as knots for smoothing splines. These knots can be potentially used for interpreting classification results. In extensive experiments on biological and social datasets, we demonstrate that our model achieves state-of-the-arts and GSSNN is superior in learning more robust graph representations. Furthermore, we show that S3 module is easily plugged into existing GNNs to improve their performance.

IJCAI Conference 2020 Conference Paper

One-Shot Neural Architecture Search via Novelty Driven Sampling

  • Miao Zhang
  • Huiqi Li
  • Shirui Pan
  • Taoping Liu
  • Steven Su

One-Shot Neural architecture search (NAS) has received wide attentions due to its computational efficiency. Most state-of-the-art One-Shot NAS methods use the validation accuracy based on inheriting weights from the supernet as the stepping stone to search for the best performing architecture, adopting a bilevel optimization pattern with assuming this validation accuracy approximates to the test accuracy after re-training. However, recent works have found that there is no positive correlation between the above validation accuracy and test accuracy for these One-Shot NAS methods, and this reward based sampling for supernet training also entails the rich-get-richer problem. To handle this deceptive problem, this paper presents a new approach, Efficient Novelty-driven Neural Architecture Search, to sample the most abnormal architecture to train the supernet. Specifically, a single-path supernet is adopted, and only the weights of a single architecture sampled by our novelty search are optimized in each step to reduce the memory demand greatly. Experiments demonstrate the effectiveness and efficiency of our novelty search based architecture sampling method.

IJCAI Conference 2020 Conference Paper

Reasoning Like Human: Hierarchical Reinforcement Learning for Knowledge Graph Reasoning

  • Guojia Wan
  • Shirui Pan
  • Chen Gong
  • Chuan Zhou
  • Gholamreza Haffari

Knowledge Graphs typically suffer from incompleteness. A popular approach to knowledge graph completion is to infer missing knowledge by multihop reasoning over the information found along other paths connecting a pair of entities. However, multi-hop reasoning is still challenging because the reasoning process usually experiences multiple semantic issue that a relation or an entity has multiple meanings. In order to deal with the situation, we propose a novel Hierarchical Reinforcement Learning framework to learn chains of reasoning from a Knowledge Graph automatically. Our framework is inspired by the hierarchical structure through which human handle cognitionally ambiguous cases. The whole reasoning process is decomposed into a hierarchy of two-level Reinforcement Learning policies for encoding historical information and learning structured action space. As a consequence, it is more feasible and natural for dealing with the multiple semantic issue. Experimental results show that our proposed model achieves substantial improvements in ambiguous relation tasks.

AAAI Conference 2020 Conference Paper

Reinforcement Learning Based Meta-Path Discovery in Large-Scale Heterogeneous Information Networks

  • Guojia Wan
  • Bo Du
  • Shirui Pan
  • Gholameza Haffari

Meta-paths are important tools for a wide variety of data mining and network analysis tasks in Heterogeneous Information Networks (HINs), due to their flexibility and interpretability to capture the complex semantic relation among objects. To date, most HIN analysis still relies on handcrafting meta-paths, which requires rich domain knowledge that is extremely difficult to obtain in complex, large-scale, and schema-rich HINs. In this work, we present a novel framework, Meta-path Discovery with Reinforcement Learning (MPDRL), to identify informative meta-paths from complex and large-scale HINs. To capture different semantic information between objects, we propose a novel multi-hop reasoning strategy in a reinforcement learning framework which aims to infer the next promising relation that links a source entity to a target entity. To improve the efficiency, moreover, we develop a type context representation embedded approach to scale the RL framework to handle million-scale HINs. As multi-hop reasoning generates rich meta-paths with various length, we further perform a meta-path induction step to summarize the important meta-paths using Lowest Common Ancestor principle. Experimental results on two large-scale HINs, Yago and NELL, validate our approach and demonstrate that our algorithm not only achieves superior performance in the link prediction task, but also identifies useful meta-paths that would have been ignored by human experts.

IJCAI Conference 2019 Conference Paper

Attributed Graph Clustering: A Deep Attentional Embedding Approach

  • Chun Wang
  • Shirui Pan
  • Ruiqi Hu
  • Guodong Long
  • Jing Jiang
  • Chengqi Zhang

Graph clustering is a fundamental task which discovers communities or groups in networks. Recent studies have mostly focused on developing deep learning approaches to learn a compact graph embedding, upon which classic clustering methods like k-means or spectral clustering algorithms are applied. These two-step frameworks are difficult to manipulate and usually lead to suboptimal performance, mainly because the graph embedding is not goal-directed, i. e. , designed for the specific clustering task. In this paper, we propose a goal-directed deep learning approach, Deep Attentional Embedded Graph Clustering (DAEGC for short). Our method focuses on attributed graphs to sufficiently explore the two sides of information in graphs. By employing an attention network to capture the importance of the neighboring nodes to a target node, our DAEGC algorithm encodes the topological structure and node content in a graph to a compact representation, on which an inner product decoder is trained to reconstruct the graph structure. Furthermore, soft labels from the graph embedding itself are generated to supervise a self-training graph clustering process, which iteratively refines the clustering results. The self-training process is jointly learned and optimized with the graph embedding in a unified framework, to mutually benefit both components. Experimental results compared with state-of-the-art algorithms demonstrate the superiority of our method.

IJCAI Conference 2019 Conference Paper

Graph WaveNet for Deep Spatial-Temporal Graph Modeling

  • Zonghan Wu
  • Shirui Pan
  • Guodong Long
  • Jing Jiang
  • Chengqi Zhang

Spatial-temporal graph modeling is an important task to analyze the spatial relations and temporal trends of components in a system. Existing approaches mostly capture the spatial dependency on a fixed graph structure, assuming that the underlying relation between entities is pre-determined. However, the explicit graph structure (relation) does not necessarily reflect the true dependency and genuine relation may be missing due to the incomplete connections in the data. Furthermore, existing methods are ineffective to capture the temporal trends as the RNNs or CNNs employed in these methods cannot capture long-range temporal sequences. To overcome these limitations, we propose in this paper a novel graph neural network architecture, {Graph WaveNet}, for spatial-temporal graph modeling. By developing a novel adaptive dependency matrix and learn it through node embedding, our model can precisely capture the hidden spatial dependency in the data. With a stacked dilated 1D convolution component whose receptive field grows exponentially as the number of layers increases, Graph WaveNet is able to handle very long sequences. These two components are integrated seamlessly in a unified framework and the whole framework is learned in an end-to-end manner. Experimental results on two public traffic network datasets, METR-LA and PEMS-BAY, demonstrate the superior performance of our algorithm.

AAAI Conference 2019 Conference Paper

Label Embedding with Partial Heterogeneous Contexts

  • Yaxin Shi
  • Donna Xu
  • Yuangang Pan
  • Ivor W. Tsang
  • Shirui Pan

Label embedding plays an important role in many real-world applications. To enhance the label relatedness captured by the embeddings, multiple contexts can be adopted. However, these contexts are heterogeneous and often partially observed in practical tasks, imposing significant challenges to capture the overall relatedness among labels. In this paper, we propose a general Partial Heterogeneous Context Label Embedding (PHCLE) framework to address these challenges. Categorizing heterogeneous contexts into two groups, relational context and descriptive context, we design tailor-made matrix factorization formula to effectively exploit the label relatedness in each context. With a shared embedding principle across heterogeneous contexts, the label relatedness is selectively aligned in a shared space. Due to our elegant formulation, PHCLE overcomes the partial context problem and can nicely incorporate more contexts, which both cannot be tackled with existing multi-context label embedding methods. An effective alternative optimization algorithm is further derived to solve the sparse matrix factorization problem. Experimental results demonstrate that the label embeddings obtained with PHCLE achieve superb performance in image classification task and exhibit good interpretability in the downstream label similarity analysis and image understanding task.

IJCAI Conference 2019 Conference Paper

Low-Bit Quantization for Attributed Network Representation Learning

  • Hong Yang
  • Shirui Pan
  • Ling Chen
  • Chuan Zhou
  • Peng Zhang

Attributed network embedding plays an important role in transferring network data into compact vectors for effective network analysis. Existing attributed network embedding models are designed either in continuous Euclidean spaces which introduce data redundancy or in binary coding spaces which incur significant loss of representation accuracy. To this end, we present a new Low-Bit Quantization for Attributed Network Representation Learning model (LQANR for short) that can learn compact node representations with low bitwidth values while preserving high representation accuracy. Specifically, we formulate a new representation learning function based on matrix factorization that can jointly learn the low-bit node representations and the layer aggregation weights under the low-bit quantization constraint. Because the new learning function falls into the category of mixed integer optimization, we propose an efficient mixed-integer based alternating direction method of multipliers (ADMM) algorithm as the solution. Experiments on real-world node classification and link prediction tasks validate the promising results of the proposed LQANR model.

IJCAI Conference 2018 Conference Paper

Active Discriminative Network Representation Learning

  • Li Gao
  • Hong Yang
  • Chuan Zhou
  • Jia Wu
  • Shirui Pan
  • Yue Hu

Most of current network representation models are learned in unsupervised fashions, which usually lack the capability of discrimination when applied to network analysis tasks, such as node classification. It is worth noting that label information is valuable for learning the discriminative network representations. However, labels of all training nodes are always difficult or expensive to obtain and manually labeling all nodes for training is inapplicable. Different sets of labeled nodes for model learning lead to different network representation results. In this paper, we propose a novel method, termed as ANRMAB, to learn the active discriminative network representations with a multi-armed bandit mechanism in active learning setting. Specifically, based on the networking data and the learned network representations, we design three active learning query strategies. By deriving an effective reward scheme that is closely related to the estimated performance measure of interest, ANRMAB uses a multi-armed bandit mechanism for adaptive decision making to select the most informative nodes for labeling. The updated labeled nodes are then used for further discriminative network representation learning. Experiments are conducted on three public data sets to verify the effectiveness of ANRMAB.

IJCAI Conference 2018 Conference Paper

Adversarially Regularized Graph Autoencoder for Graph Embedding

  • Shirui Pan
  • Ruiqi Hu
  • Guodong Long
  • Jing Jiang
  • Lina Yao
  • Chengqi Zhang

Graph embedding is an effective method to represent graph data in a low dimensional space for graph analytics. Most existing embedding algorithms typically focus on preserving the topological structure or minimizing the reconstruction errors of graph data, but they have mostly ignored the data distribution of the latent codes from the graphs, which often results in inferior embedding in real-world graph data. In this paper, we propose a novel adversarial graph embedding framework for graph data. The framework encodes the topological structure and node content in a graph to a compact representation, on which a decoder is trained to reconstruct the graph structure. Furthermore, the latent representation is enforced to match a prior distribution via an adversarial training scheme. To learn a robust embedding, two variants of adversarial approaches, adversarially regularized graph autoencoder (ARGA) and adversarially regularized variational graph autoencoder (ARVGA), are developed. Experimental studies on real-world graphs validate our design and demonstrate that our algorithms outperform baselines by a wide margin in link prediction, graph clustering, and graph visualization tasks.

AAAI Conference 2018 Conference Paper

DiSAN: Directional Self-Attention Network for RNN/CNN-Free Language Understanding

  • Tao Shen
  • Tianyi Zhou
  • Guodong Long
  • Jing Jiang
  • Shirui Pan
  • Chengqi Zhang

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i. e. , feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)”, is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1. 02% on the Stanford Natural Language Inference (SNLI) dataset, and shows stateof-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

IJCAI Conference 2018 Conference Paper

Discrete Network Embedding

  • Xiaobo Shen
  • Shirui Pan
  • Weiwei Liu
  • Yew-Soon Ong
  • Quan-Sen Sun

Network embedding aims to seek low-dimensional vector representations for network nodes, by preserving the network structure. The network embedding is typically represented in continuous vector, which imposes formidable challenges in storage and computation costs, particularly in large-scale applications. To address the issue, this paper proposes a novel discrete network embedding (DNE) for more compact representations. In particular, DNE learns short binary codes to represent each node. The Hamming similarity between two binary embeddings is then employed to well approximate the ground-truth similarity. A novel discrete multi-class classifier is also developed to expedite classification. Moreover, we propose to jointly learn the discrete embedding and classifier within a unified framework to improve the compactness and discrimination of network embedding. Extensive experiments on node classification consistently demonstrate that DNE exhibits lower storage and computational complexity than state-of-the-art network embedding methods, while obtains competitive classification results.

AAAI Conference 2016 Conference Paper

Direct Discriminative Bag Mapping for Multi-Instance Learning

  • Jia Wu
  • Shirui Pan
  • Peng Zhang
  • Xingquan Zhu

Multi-instance learning (MIL) is useful for tackling labeling ambiguity in learning tasks, by allowing a bag of instances to share one label. Recently, bag mapping methods, which transform a bag to a single instance in a new space via instance selection, have drawn significant attentions. To date, most existing works are developed based on the original space, i. e. , utilizing all instances for bag mapping, and instance selection is indirectly tied to the MIL objective. As a result, it is hard to guarantee the distinguish capacity of the selected instances in the new bag mapping space for MIL. In this paper, we propose a direct discriminative mapping approach for multi-instance learning (MILDM), which identifies instances to directly distinguish bags in the new mapping space. Experiments and comparisons on real-world learning tasks demonstrate the algorithm performance.

IJCAI Conference 2016 Conference Paper

Iterative Views Agreement: An Iterative Low-Rank Based Structured Optimization Method to Multi-View Spectral Clustering

  • Yang Wang
  • Wenjie Zhang
  • Lin Wu
  • Xuemin Lin
  • Meng Fang
  • Shirui Pan

Multi-view spectral clustering, which aims at yielding an agreement or consensus data objects grouping across multi-views with their graph laplacian matrices, is a fundamental clustering problem. Among the existing methods, Low-Rank Representation (LRR) based method is quite superior in terms of its effectiveness, intuitiveness and robustness to noise corruptions. However, it aggressively tries to learn a common low-dimensional subspace for multi-view data, while inattentively ignoring the local manifold structure in each view, which is critically important to the spectral clustering; worse still, the low-rank minimization is enforced to achieve the data correlation consensus among all views, failing to flexibly preserve the local manifold structure for each view. In this paper, 1) we propose a multi-graph laplacian regularized LRR with each graph laplacian corresponding to one view to characterize its local manifold structure. 2) Instead of directly enforcing the low-rank minimization among all views for correlation consensus, we separately impose low-rank constraint on each view, coupled with a mutual structural consensus constraint, where it is able to not only well preserve the local manifold structure but also serve as a constraint for that from other views, which iteratively makes the views more agreeable. Extensive experiments on real-world multi-view data sets demonstrate its superiority.

IJCAI Conference 2016 Conference Paper

Tri-Party Deep Network Representation

  • Shirui Pan
  • Jia Wu
  • Xingquan Zhu
  • Chengqi Zhang
  • Yang Wang

Information network mining often requires examination of linkage relationships between nodes for analysis. Recently, network representation has emerged to represent each node in a vector format, embedding network structure, so off-the-shelf machine learning methods can be directly applied for analysis. To date, existing methods only focus on one aspect of node information and cannot leverage node labels. In this paper, we propose TriDNR, a tri-party deep network representation model, using information from three parties: node structure, node content, and node labels (if available) to jointly learn optimal node representation. TriDNR is based on our new coupled deep natural language module, whose learning is enforced at three levels: (1) at the network structure level, TriDNR exploits inter-node relationship by maximizing the probability of observing surrounding nodes given a node in random walks; (2) at the node content level, TriDNR captures node-word correlation by maximizing the co-occurrence of word sequence given a node; and (3) at the node label level, TriDNR models label-word correspondence by maximizing the probability of word sequence given a class label. The tri-party information is jointly fed into the neural network model to mutually enhance each other to learn optimal representation, and results in up to 79% classification accuracy gain, compared to state-of-the-art methods.

IJCAI Conference 2015 Conference Paper

Multi-Graph-View Learning for Complicated Object Classification

  • Jia Wu
  • Shirui Pan
  • Xingquan Zhu
  • Zhihua Cai
  • Chengqi Zhang

In this paper, we propose to represent and classify complicated objects. In order to represent the objects, we propose a multi-graph-view model which uses graphs constructed from multiple graph-views to represent an object. In addition, a bag based multi-graph model is further used to relax labeling by only requiring one label for a bag of graphs, which represent one object. In order to learn classification models, we propose a multi-graph-view bag learning algorithm (MGVBL), which aims to explore subgraph features from multiple graphviews for learning. By enabling a joint regularization across multiple graph-views, and enforcing labeling constraints at the bag and graph levels, MGVBL is able to discover most effective subgraph features across all graph-views for learning. Experiments on real-world learning tasks demonstrate the performance of MGVBL for complicated object classification.

IJCAI Conference 2013 Conference Paper

Graph Classification with Imbalanced Class Distributions and Noise

  • Shirui Pan
  • Xingquan Zhu

Recent years have witnessed an increasing number of applications involving data with structural dependency and graph representations. For these applications, it is very common that their class distribution is imbalanced with minority samples being only a small portion of the population. Such imbalanced class distributions impose significant challenges to the learning algorithms. This problem is further complicated with the presence of noise or outliers in the graph data. In this paper, we propose an imbalanced graph boosting algorithm, igBoost, that progressively selects informative subgraph patterns from imbalanced graph data for learning. To handle class imbalance, we take class distributions into consideration to assign different weight values to graphs. The distance of each graph to its class center is also considered to adjust the weight to reduce the impact of noisy graph data. The weight values are integrated into the iterative subgraph feature selection and margin learning process to achieve maximum benefits. Experiments on realworld graph data with different degrees of class imbalance and noise demonstrate the algorithm performance.