Arrow Research search

Author name cluster

Hao Peng

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

75 papers
2 author rows

Possible papers

75

AAAI Conference 2026 Conference Paper

Do Not Merge My Model! Safeguarding Open-Source LLMs Against Unauthorized Model Merging

  • Qinfeng Li
  • Miao Pan
  • Jintao Chen
  • Fu Teng
  • Zhiqiang Shen
  • Ge Su
  • Hao Peng
  • Xuhong Zhang

Model merging has emerged as an efficient technique for expanding large language models (LLMs) by integrating specialized expert models. However, it also introduces a new threat: model merging stealing, where free-riders exploit models through unauthorized model merging. Unfortunately, existing defense mechanisms fail to provide effective protection. Specifically, we identify three critical protection properties that existing methods fail to simultaneously satisfy: (1) proactively preventing unauthorized merging; (2) ensuring compatibility with general open-source settings; (3) achieving high security with negligible performance loss. To address the above issues, we propose MergeBarrier, a plug-and-play defense that proactively prevents unauthorized merging. The core design of MergeBarrier is to disrupt the Linear Mode Connectivity (LMC) between the protected model and its homologous counterparts, thereby eliminating the low-loss path required for effective model merging. Extensive experiments show that MergeBarrier effectively prevents model merging stealing with negligible accuracy loss.

AAAI Conference 2026 Conference Paper

Hyperbolic Continuous Structural Entropy for Hierarchical Clustering

  • Guangjie Zeng
  • Hao Peng
  • Angsheng Li
  • Li Sun
  • Chunyang Liu
  • Shengze Li
  • Yicheng Pan
  • Philip S. Yu

Hierarchical clustering is a fundamental machine-learning technique for grouping data points into dendrograms. However, existing hierarchical clustering methods encounter two primary challenges: 1) Most methods specify dendrograms without a global objective. 2) Graph-based methods often neglect the significance of graph structure, optimizing objectives on complete or static predefined graphs. In this work, we propose Hyperbolic Continuous Structural Entropy neural networks, namely HypCSE, for structure-enhanced continuous hierarchical clustering. Our key idea is to map data points in the hyperbolic space and minimize the relaxed continuous structural entropy (SE) on structure-enhanced graphs. Specifically, we encode graph vertices in hyperbolic space using hyperbolic graph neural networks and minimize approximate SE defined on graph embeddings. To make the SE objective differentiable for optimization, we reformulate it into a function using the lowest common ancestor (LCA) on trees and then relax it into continuous SE (CSE) by the analogy of hyperbolic graph embeddings and partitioning trees. To ensure a graph structure that effectively captures the hierarchy of data points for CSE calculation, we employ a graph structure learning (GSL) strategy that updates the graph structure during training. Extensive experiments on seven datasets demonstrate the superior performance of HypCSE.

AAAI Conference 2026 Conference Paper

Learning to Explore: Policy-Guided Outlier Synthesis for Graph Out-of-Distribution Detection

  • Li Sun
  • Lanxu Yang
  • Jiayu Tian
  • Bowen Fang
  • Xiaoyan Yu
  • Junda Ye
  • Peng Tang
  • Hao Peng

Detecting Out-of-Distribution (OOD) graphs—those are drawn from a different distribution from the training data-is a critical task for ensuring the safety and reliability of Graph Neural Networks. The main challenge in unsupervised graph-level Out-of-Distribution detection lies in its common reliance on purely in-distribution (ID) data. This ID-only training paradigm leads to an incomplete characterization of the feature space, resulting in decision boundaries that lack the robustness needed to effectively separate ID from OOD samples. While incorporating synthesized outliers into the training process is a promising direction, existing generation methods are limited by their dependence on pre-defined, non-adaptive sampling heuristics (e.g., distance- or density-based). Such fixed strategies lack the flexibility to systematically explore the most informative OOD regions for refining decision boundaries. To overcome this limitation, we propose a novel Policy-Guided Outlier Synthesis (PGOS) framework that replaces static heuristics with a learned, adaptive exploration policy. PGOS trains a reinforcement learning agent to autonomously navigate low-density regions within a structured latent space, sampling representations that are maximally effective for regularizing the OOD decision boundary. These sampled points are then decoded into high-quality pseudo-OOD graphs to enhance the detector's robustness. Extensive experiments demonstrate the strong performance of our method, state-of-the-art results on multiple graph OOD and anomaly detection benchmarks.

TMLR Journal 2026 Journal Article

Prioritizing Image-Related Tokens Enhances Vision-Language Pre-Training

  • Yangyi Chen
  • Hao Peng
  • Tong Zhang
  • Heng Ji

In standard large vision-language models (LVLMs) pre-training, the model typically maximizes the joint probability of the caption conditioned on the image via next-token prediction (NTP); however, since only a small subset of caption tokens directly relates to the visual content, this naive NTP unintentionally fits the model to noise and increases the risk of hallucination. We present PRIOR, a simple vision-language pre-training approach that addresses this issue by prioritizing image-related tokens through differential weighting in the NTP loss, drawing from the importance sampling framework. PRIOR introduces a reference model—a text-only large language model (LLM) trained on the captions without image inputs, to weight each token based on its probability for LVLMs training. Intuitively, tokens that are directly related to the visual inputs are harder to predict without the image and thus receive lower probabilities from the text-only reference LLM. During training, we implement a token-specific re-weighting term based on the importance scores to adjust each token's loss. We implement PRIOR in two distinct settings: LVLMs with visual encoders and LVLMs without visual encoders. We observe 19% and 8% average relative improvement, respectively, on several vision-language benchmarks compared to NTP. In addition, PRIOR exhibits superior scaling properties, as demonstrated by significantly higher scaling coefficients, indicating greater potential for performance gains compared to NTP given increasing compute and data. The code is available at https://github.com/Yangyi-Chen/PRIOR.

TMLR Journal 2026 Journal Article

Process Reward Models That Think

  • Muhammad Khalifa
  • Rishabh Agarwal
  • Lajanugen Logeswaran
  • Jaekyeom Kim
  • Hao Peng
  • Moontae Lee
  • Honglak Lee
  • Lu Wang

Step-by-step verifiers—also known as process reward models (PRMs)—are a key ingredient for test-time scaling, but training them requires expensive step-level supervision. This work aims to build data-efficient PRMs as verbalized step-wise reward models that verify every step in the solution by generating a verification chain-of-thought (CoT). We propose ThinkPRM, a long CoT verifier fine-tuned on orders of magnitude fewer process labels than those required by discriminative PRMs. Our approach capitalizes on the inherent reasoning abilities of long CoT models, and outperforms LLM-as-a-Judge and discriminative verifiers—using only 1% of the process labels in PRM800K—across several challenging benchmarks. Specifically, ThinkPRM beats the baselines on ProcessBench, MATH-500, and AIME ’24 under best-of-N selection and reward-guided search. In an out-of-domain evaluation over subsets of GPQA-Diamond and LiveCodeBench, our PRM surpasses discriminative verifiers trained with the full PRM800K by 8% and 4.5%, respectively. Lastly, under the same token budget, ThinkPRM scales up verification compute more effectively compared to LLM-as-a-Judge, outperforming it by 7.2% on a subset of ProcessBench. This work highlights the value of generative, long CoT PRMs that can scale test-time compute for verification while requiring minimal supervision for training.

AAAI Conference 2026 Conference Paper

RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation

  • Qinfeng Li
  • Miao Pan
  • Ke Xiong
  • Ge Su
  • Zhiqiang Shen
  • Yan Liu
  • Sun Bing
  • Hao Peng

Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths—progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining contrastive reindexing for inter-class isolation and constrained cascade generation for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering the first comprehensive defense against knowledge base extraction attacks.

AAAI Conference 2026 Conference Paper

RPGen: Robust and Differentially Private Synthetic Image Generation

  • Zihao Wang
  • Hao Peng
  • Wei Dong
  • Yuecen Wei
  • Li Sun
  • Zhengtao Yu

Differentially private (DP) image synthesis enables the generation of realistic images while bounding privacy leakage, facilitating secure data sharing across organizations. However, the Gaussian noise injected during DP training, such as via DP-SGD, often severely degrades synthesis quality by disrupting model convergence. To address this, we introduce RPGen, a novel framework that enhances diffusion models' parameter robustness to mitigate DP noise effects without compromising privacy guarantees. At its core, RPGen employs adversarial model perturbation (AMP) during public pre-training to build resilience against perturbations, but we identify and tackle the critical issue of robustness transferability across domains. RPGen achieves this through a three-step process: (1) A pre-trained classifier infers labels for private images, aggregated into a class distribution noised with Gaussian mechanism for DP, and public samples are selected to match this privatized distribution for domain alignment; (2) The diffusion model is pre-trained on this curated subset with adversarial model perturbation to foster robustness; (3) The model undergoes fine-tuning on private data using DP-SGD. This synergy of robustness augmentation and transferability optimization yields high-fidelity synthesis. Extensive evaluations on ImageNet for pre-training, with CelebA and CIFAR-10 for synthesis, show RPGen outperforming state-of-the-art baselines across epsilon in 1, 5, 10. On average, it achieves 20.18% lower FID and 5.45% higher classification accuracy. Ablations confirm the efficacy of domain curation and modest perturbations, establishing RPGen as a new benchmark for privacy-utility trade-offs in image generation.

AAAI Conference 2026 Conference Paper

Structural Entropy Guided Incremental Learning for Open-World Multimodal Social Event Detection

  • Zhiwei Yang
  • Haimei Qin
  • Xiaoyan Yu
  • Hao Peng
  • Lei Jiang
  • Li Sun
  • Zhiqin Yang

With the explosive growth of multimodal data streams on social media, the timely detection of emerging social events has become increasingly important. As a result, Multimodal Social Event Detection in open-world settings is receiving growing attention. However, most existing methods face two major limitations: (1) They overlook the dynamic nature of open-world social media data and fail to design dedicated incremental learning frameworks. (2) They ignore the impact of noise in streaming data, leading to performance degradation over long-term detection. To overcome these limitations, we propose SeInEvent (**S**tructural **E**ntropy Guided **In**cremental Learning for Open-World Multimodal Social **Event** Detection). Our innovations are as follows: **First**, considering data dynamics, we design a self-supervised alternating incremental contrastive learning mechanism. Through knowledge distillation, historical event clusters were reviewed and consolidated, and contrastive learning was combined to absorb knowledge of unknown events, ultimately achieving incremental learning without labels. **Second**, addressing the impact of noise, we propose a Pointwise Structural Entropy-based noise filter, which quantifies each sample’s informational contribution to the event clustering structure. It enables automatic removal of noisy data and supports robust long-term detection. Extensive experiments on two public datasets demonstrate that SeInEvent achieves superior performance.

IJCAI Conference 2025 Conference Paper

A Survey of Structural Entropy: Theory, Methods, and Applications

  • Dingli Su
  • Hao Peng
  • Yicheng Pan
  • Angsheng Li

Classical information theory, a cornerstone of artificial intelligence, is fundamentally limited by its local perspective, often analyzing pairwise interactions while ignoring the larger, hierarchical architecture of complex systems. Structural entropy (SE) presents a paradigm shift, extending Shannon entropy to quantify information on a global scale and measure the uncertainty embedded in a system's organizational hierarchy. Although its applications have broadened significantly from its origins in community detection across diverse AI domains, a systematic synthesis of its theory, computational methods, and applications is currently lacking. This survey provides a comprehensive overview of SE to fill this critical void in the literature. We offer a detailed examination of its theoretical foundations, computational frameworks, and key learning paradigms, with a focus on its integration with graph learning and reinforcement learning. Through an exploration of its diverse applications, we highlight the power of SE to advance graph-based analysis and modeling. Finally, we discuss key challenges and future research opportunities for incorporating SE principles into the development of more interpretable and theoretically grounded AI systems.

NeurIPS Conference 2025 Conference Paper

AGENTIF: Benchmarking Large Language Models Instruction Following Ability in Agentic Scenarios

  • Yunjia Qi
  • Hao Peng
  • Xiaozhi Wang
  • Amy Xin
  • Youfeng Liu
  • Bin Xu
  • Lei Hou
  • Juanzi Li

Large Language Models (LLMs) have demonstrated advanced capabilities in real-world agentic applications. Growing research efforts aim to develop LLM-based agents to address practical demands, introducing a new challenge: agentic scenarios often involve lengthy instructions with complex constraints, such as extended system prompts and detailed tool specifications. While adherence to such instructions is crucial for agentic applications, whether LLMs can reliably follow them remains underexplored. In this paper, we introduce AgentIF, the first benchmark for systematically evaluating LLM instruction following ability in agentic scenarios. AgentIF features three key characteristics: (1) Realistic, constructed from $50$ real-world agentic applications. (2) Long, averaging $1, 723$ words with a maximum of $15, 630$ words. (3) Complex, averaging $11. 9$ constraints per instruction, covering diverse constraint types, such as tool specifications and condition constraints. To construct AgentIF, we collect $707$ human-annotated instructions across $50$ agentic tasks from industrial application agents and open-source agentic systems. For each instruction, we annotate the associated constraints and corresponding evaluation metrics, including code-based evaluation, LLM-based evaluation, and hybrid code-LLM evaluation. We use AgentIF to systematically evaluate existing advanced LLMs. We observe that current models generally perform poorly, especially in handling complex constraint structures and tool specifications. We further conduct error analysis and analytical experiments on instruction length and meta constraints, providing some findings about the failure modes of existing LLMs. We have released the code and data to facilitate future research.

NeurIPS Conference 2025 Conference Paper

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

  • Qinfeng Li
  • Tianyue Luo
  • Xuhong Zhang
  • Yangfan Xie
  • Zhiqiang Shen
  • Lijun Zhang
  • Yier Jin
  • Hao Peng

Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computation- and communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.

JMLR Journal 2025 Journal Article

Hierarchical Decision Making Based on Structural Information Principles

  • Xianghua Zeng
  • Hao Peng
  • Dingli Su
  • Angsheng Li

Hierarchical Reinforcement Learning (HRL) is a promising approach for managing task complexity across multiple levels of abstraction and accelerating long-horizon agent exploration. However, the effectiveness of hierarchical policies heavily depends on prior knowledge and manual assumptions about skill definitions and task decomposition. In this paper, we propose a novel Structural Information principles-based framework, namely SIDM, for hierarchical Decision Making in both single-agent and multi-agent scenarios. Central to our work is the utilization of structural information embedded in the decision-making process to adaptively and dynamically discover and learn hierarchical policies through environmental abstractions. Specifically, we present an abstraction mechanism that processes historical state-action trajectories to construct abstract representations of states and actions. We define and optimize directed structural entropy—a metric quantifying the uncertainty in transition dynamics between abstract states—to discover skills that capture key transition patterns in RL environments. Building on these findings, we develop a skill-based learning method for single-agent scenarios and a role-based collaboration method for multi-agent scenarios, both of which can flexibly integrate various underlying algorithms for enhanced performance. Extensive evaluations on challenging benchmarks demonstrate that our framework significantly and consistently outperforms state-of-the-art baselines, improving the effectiveness, efficiency, and stability of policy learning by up to 32.70%, 64.86%, and 88.26%, respectively, as measured by average rewards, convergence timesteps, and standard deviations. [abs] [ pdf ][ bib ] &copy JMLR 2025. ( edit, beta )

AAAI Conference 2025 Conference Paper

IOP: An Idempotent-Like Optimization Method on the Pareto Front of Hypernetwork

  • Hui Wang
  • Renyu Yang
  • Jie Sun
  • Hao Peng
  • Xudong Mou
  • Tianyu Wo
  • Xudong Liu

Pareto Front Learning (PFL) has been one of the effective means to resolve multi-objective optimization problems through exploring all optimal solutions to learn the entire Pareto front. Pareto Hypernetwork (PHN) is a new promising way to generate the sequence of Pareto-optimal solutions that can be further used as potential solutions to constitute the Pareto front. However, the existing PHN-based approaches suffer from two performance issues: They take as inputs human-crafted preference vector or chunk embedding, rather than the input data samples, and thus vulnerable to data distribution shifts. Such approaches cannot optimize all potential solutions when forming the Pareto front, as they merely optimize the loss pertaining to one single input at a time of optimization round. To improve the quality of the Pareto front, we propose IOP, a novel Idempotent-like Optimization method to learn the entire Pareto front accurately and enhance Hypernetwork's adaptability to distribution shifts. In particular, IOP performs idempotent-like optimization by exploiting manifold space mapping, so that the target networks generated by the optimized Hypernetwork can effectively handle samples with similar distributions of the input samples, without the pre-defined human-crafted inputs. IOP maximizes the Hypervolume indicator that is composed of all potential solutions at a higher level. Experimental results demonstrate that IOP outperforms the state-of-the-art methods by 4.7% on average in producing the Pareto front and has a 10.5% improvement in adaptability.

IJCAI Conference 2025 Conference Paper

Low-Light Video Enhancement via Spatial-Temporal Consistent Decomposition

  • Xiaogang Xu
  • Kun Zhou
  • Tao Hu
  • Jiafei Wu
  • Ruixing Wang
  • Hao Peng
  • Bei Yu

Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise. In this paper, we present an innovative video decomposition strategy that incorporates view-independent and view-dependent components to enhance the performance of LLVE. We leverage dynamic cross-frame correspondences for the view-independent term (which primarily captures intrinsic appearance) and impose a scene-level continuity constraint on the view-dependent term (which mainly describes the shading condition) to achieve consistent and satisfactory decomposition results. To further ensure consistent decomposition, we introduce a dual-structure enhancement network featuring a cross-frame interaction mechanism. By supervising different frames simultaneously, this network encourages them to exhibit matching decomposition features. This mechanism can seamlessly integrate with encoder-decoder single-frame networks, incurring minimal additional parameter costs. Extensive experiments are conducted on widely recognized LLVE benchmarks, covering diverse scenarios. Our framework consistently outperforms existing methods, establishing a new SOTA performance.

AAAI Conference 2025 Conference Paper

Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics

  • Li Sun
  • Ziheng Zhang
  • Zixi Wang
  • Yujie Wang
  • Qiqi Wan
  • Hao Li
  • Hao Peng
  • Philip S. Yu

Dynamic interacting system modeling is important for understanding and simulating real world systems, e.g., meteorology and the spread of COVID. The system is typically described as a graph, where multiple objects dynamically interact with each other and evolve over time. In recent years, graph Ordinary Differential Equations (ODE) receive increasing research attentions. While achieving encouraging results, existing solutions prioritize the traditional Euclidean space, and neglect the intrinsic geometry of the system and physics laws, e.g., the principle of entropy increasing. The aforementioned limitations motivate us to rethink the system dynamics from a fresh perspective of Riemannian geometry, and pose a more realistic problem of physics-informed dynamic system modeling, considering the underlying geometry and physics law for the first time. In this paper, we present a novel physics-informed Riemannian graph ODE for a wide range of entropy-increasing dynamic systems (termed as Pioneer). In particular, we formulate a differential system on the Riemannian manifold, where a manifold-valued graph ODE is governed by the proposed constrained Ricci flow, and a manifold preserving Gyro-transform aware of system geometry. Theoretically, we report the provable entropy non-decreasing of our formulation, obeying the physics laws. Empirical results show the superiority of Pioneer on real datasets.

AAAI Conference 2025 Conference Paper

Prompt-based Unifying Inference Attack on Graph Neural Networks

  • Yuecen Wei
  • Xingcheng Fu
  • Lingyun Liu
  • Qingyun Sun
  • Hao Peng
  • Chunming Hu

Graph neural networks (GNNs) provide important prospective insights in applications such as social behavior analysis and financial risk analysis based on their powerful learning capabilities on graph data. Nevertheless, GNNs' predictive performance relies on the quality of task-specific node labels, so it is common practice to improve the model's generalization ability in the downstream execution of decision-making tasks through pre-training. Graph prompting is a prudent choice but risky without taking measures to prevent data leakage. In other words, in high-risk decision scenarios, prompt learning can infer private information by accessing model parameters trained on private data (publishing model parameters in pre-training, i.e., without directly leaking the raw data, is a tacitly accepted trend). However, myriad graph inference attacks necessitate tailored module design and processing to enhance inference capabilities due to variations in supervision signals. In this paper, we propose a novel Prompt-based unifying Inference Attack framework on GNNs, named ProIA. Specifically, ProIA retains the crucial topological information of the graph during pre-training, enhancing the background knowledge of the inference attack model. It then utilizes a unified prompt and introduces additional disentanglement factors in downstream attacks to adapt to task-relevant knowledge. Finally, extensive experiments show that ProIA enhances attack capabilities and demonstrates remarkable adaptability to various inference attacks.

NeurIPS Conference 2025 Conference Paper

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

  • Sagnik Mukherjee
  • Lifan Yuan
  • Dilek Hakkani-Tur
  • Hao Peng

Reinforcement learning (RL) yields substantial improvements in large language models’ (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from updating only a small subnetwork comprising just 5%-30% of the parameters, with the rest effectively unchanged. We refer to this phenomenon as parameter update sparsity induced by RL. It is observed across all 7 widely-used RL algorithms (e. g. , PPO, GRPO, DPO) and all 10 LLMs from different families in our experiments. This sparsity is intrinsic and occurs without any explicit sparsity-promoting regularizations or architectural constraints. Finetuning the subnetwork alone recovers the test accuracy, and, remarkably, produces a model nearly identical to the one obtained via full finetuning. The subnetworks from different random seeds, training data, and even RL algorithms show substantially greater overlap than expected by chance. Our analysis suggests that this sparsity is not due to updating only a subset of layers; instead, nearly all parameter matrices receive similarly sparse updates. Moreover, the updates to almost all parameter matrices are nearly full-rank, suggesting RL updates a small subset of parameters that nevertheless span almost the full subspaces that the parameter matrices can represent. We conjecture that the this update sparsity can be primarily attributed to training on data that is near the policy distribution; techniques that encourage the policy to remain close to the pretrained model, such as the KL regularization and gradient clipping, have limited impact.

IJCAI Conference 2025 Conference Paper

Scalable Multi-Stage Influence Function for Large Language Models via Eigenvalue-Corrected Kronecker-Factored Parameterization

  • Yuntai Bao
  • Xuhong Zhang
  • Tianyu Du
  • Xinkui Zhao
  • Jiang Zong
  • Hao Peng
  • Jianwei Yin

Pre-trained large language models (LLMs) are commonly fine-tuned to adapt to downstream tasks. Since the majority of knowledge is acquired during pre-training, attributing the predictions of fine-tuned LLMs to their pre-training data may provide valuable insights. Influence functions have been proposed as a means to explain model predictions based on training data. However, existing approaches often fail to compute "multi-stage" influence and lack scalability to billion-scale LLMs. In this paper, we propose multi-stage influence functions to attribute the downstream predictions of fine-tuned LLMs to pre-training data under the full-parameter fine-tuning paradigm. To enhance the efficiency and practicality of our multi-stage influence function, we leverage Eigenvalue-corrected Kronecker-Factored (EK-FAC) parameterization for efficient approximation. Empirical results validate the superior scalability of EK-FAC approximation and the effectiveness of our multi-stage influence function. Additionally, case studies on a real-world LLM, dolly-v2-3b, demonstrate its interpretive power, with exemplars illustrating insights provided by multi-stage influence estimates.

IJCAI Conference 2025 Conference Paper

SetKE: Knowledge Editing for Knowledge Elements Overlap

  • Yifan Wei
  • Xiaoyan Yu
  • Ran Song
  • Hao Peng
  • Angsheng Li

Large Language Models (LLMs) excel in tasks such as retrieval and question answering but require updates to incorporate new knowledge and reduce inaccuracies and hallucinations. Traditional updating methods, like fine-tuning and incremental learning, face challenges such as overfitting and high computational costs. Knowledge Editing (KE) provides a promising alternative but often overlooks the Knowledge Element Overlap (KEO) phenomenon, where multiple triplets share common elements, leading to editing conflicts. We identify the prevalence of KEO in existing KE datasets and show its significant impact on current KE methods, causing performance degradation in handling such triplets. To address this, we propose a new formulation, Knowledge Set Editing (KSE), and introduce SetKE, a method that edits sets of triplets simultaneously. Experimental results demonstrate that SetKE outperforms existing methods in KEO scenarios on mainstream LLMs. Additionally, we introduce EditSet, a dataset containing KEO triplets, providing a comprehensive benchmark.

IJCAI Conference 2025 Conference Paper

STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation

  • Yiming Wang
  • Hao Peng
  • Senzhang Wang
  • Haohua Du
  • Chunyang Liu
  • Jia Wu
  • Guanlin Wu

Traffic data imputation is fundamentally important to support various applications in intelligent transportation systems such as traffic flow prediction. However, existing time-to-space sequential methods often fail to effectively extract features in block-wise missing data scenarios. Meanwhile, the static graph structure for spatial feature propagation significantly constrains the model's flexibility in handling the distribution shift issue for the nonstationary traffic data. To address these issues, this paper proposes a Spatio-Temporal Attention Mixture of experts network named STAMImputer for traffic data imputation. Specifically, we introduce a Mixture of Experts (MoE) framework to capture latent spatio-temporal features and their influence weights, effectively imputing block missing. A novel Low-rank guided Sampling Graph ATtention (LrSGAT) mechanism is designed to dynamically balance the local and global correlations across road networks. The sampled attention vectors are utilized to generate dynamic graphs that capture real-time spatial correlations. Extensive experiments are conducted on four traffic datasets for evaluation. The result shows STAMImputer achieves significantly performance improvement compared with existing SOTA approaches. Our codes are available at https: //github. com/RingBDStack/STAMImupter.

AAAI Conference 2025 Conference Paper

Structural Entropy Guided Probabilistic Coding

  • Xiang Huang
  • Hao Peng
  • Li Sun
  • Hui Lin
  • Chunyang Liu
  • Jiang Cao
  • Philip S. Yu

Probabilistic embeddings have several advantages over deterministic embeddings as they map each data point to a distribution, which better describes the uncertainty and complexity of data. Many works focus on adjusting the distribution constraint under the Information Bottleneck (IB) principle to enhance representation learning. However, these proposed regularization terms only consider the constraint of each latent variable, omitting the structural information between latent variables. In this paper, we propose a novel structural entropy-guided probabilistic coding model, named SEPC. Specifically, we incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Besides, as traditional structural information theory is not well-suited for regression tasks, we propose a probabilistic encoding tree, transferring regression tasks to classification tasks while diminishing the influence of the transformation. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC compared to other state-of-the-art models in terms of effectiveness, generalization capability, and robustness to label noise.

NeurIPS Conference 2025 Conference Paper

Structural Information-based Hierarchical Diffusion for Offline Reinforcement Learning

  • Xianghua Zeng
  • Hao Peng
  • Yicheng Pan
  • Angsheng Li
  • Guanlin Wu

Diffusion-based generative methods have shown promising potential for modeling trajectories from offline reinforcement learning (RL) datasets, and hierarchical diffusion has been introduced to mitigate variance accumulation and computational challenges in long-horizon planning tasks. However, existing approaches typically assume a fixed two-layer diffusion hierarchy with a single predefined temporal scale, which limits adaptability to diverse downstream tasks and reduces flexibility in decision making. In this work, we propose SIHD, a novel Structural Information-based Hierarchical Diffusion framework for effective and stable offline policy learning in long-horizon environments with sparse rewards. Specifically, we analyze structural information embedded in offline trajectories to construct the diffusion hierarchy adaptively, enabling flexible trajectory modeling across multiple temporal scales. Rather than relying on reward predictions from localized sub-trajectories, we quantify the structural information gain of each state community and use it as a conditioning signal within the corresponding diffusion layer. To reduce overreliance on offline datasets, we introduce a structural entropy regularizer that encourages exploration of underrepresented states while avoiding extrapolation errors from distributional shifts. Extensive evaluations show that SIHD significantly outperforms state-of-the-art baselines in decision-making performance and demonstrates superior generalization across diverse scenarios.

IJCAI Conference 2025 Conference Paper

T-T: Table Transformer for Tagging-based Aspect Sentiment Triplet Extraction

  • Kun Peng
  • Chaodong Tong
  • Cong Cao
  • Hao Peng
  • Qian Li
  • Guanlin Wu
  • Lei Jiang
  • Yanbing Liu

Aspect sentiment triplet extraction (ASTE) aims to extract triplets composed of aspect terms, opinion terms, and sentiment polarities from given sentences. The table tagging method is a popular approach to addressing this task, which encodes a sentence into a 2-dimensional table, allowing for the tagging of relations between any two words. Previous efforts have focused on designing various downstream relation learning modules to better capture interactions between tokens in the table, revealing that a stronger capability in relation capture can lead to greater improvements in the model. Motivated by this, we attempt to directly utilize transformer layers as downstream relation learning modules. Due to the powerful semantic modeling capability of transformers, it is foreseeable that this will lead to excellent improvement. However, owing to the quadratic relation between the length of the table and the length of the input sentence sequence, using transformers directly faces two challenges: overly long table sequences and unfair local attention interaction. To address these challenges, we propose a novel Table-Transformer (T-T) for the tagging-based ASTE method. Specifically, we introduce a stripe attention mechanism with a loop-shift strategy to tackle these challenges. The former modifies the global attention mechanism to only attend to a 2-dimensional local attention window, while the latter facilitates interaction between different attention windows. Extensive and comprehensive experiments demonstrate that the T-T, as a downstream relation learning module, achieves state-of-the-art performance with lower computational costs.

NeurIPS Conference 2025 Conference Paper

The Best Instruction-Tuning Data are Those That Fit

  • Dylan Zhang
  • Qirun Dai
  • Hao Peng

High-quality supervised finetuning (SFT) data are essential for unlocking pretrained LLMs’ capabilities. Typically, instructions are paired with responses from various sources—by human annotators or other LMs—which are often out of the distribution of the target model to be finetuned. At scale, this mismatch can lead to diminishing returns and even hurt model performance and robustness. We hypothesize that SFT is most effective when the data is aligned with the model’s pretrained distribution, and propose GRAPE —a novel SFT framework that tailors supervision to the target model. For each instruction, it g athers r esponses from various sources and selects the one that a ligns most closely to the model’s pre trained distribution, as measured by the normalized probability. Standard SFT is then performed on these selected responses. We first evaluate GRAPE in a controlled experiment, sampling multiple responses per question in the UltraInteract dataset from diverse models. We finetune using GRAPE-selected data on LMs from different families, including LLaMA-1-8B, Mistral-7B, and Qwen2. 5-7B. GRAPE significantly outperforms strong baselines—including distilling from the strongest model—with absolute gains up to 13. 8% averaged across benchmarks, and outperforms a 3× larger data baseline with improvements up to 17. 3%. GRAPE's benefits generalize to off-the-shelf SFT data. When used to subsample from the post-training data of Tulu3 and Olmo-2, GRAPE surpasses strong baselines trained on 4. 5× more data by 6. 1%, and outperforms state-of-the-art selection methods by 3. 9% on average. Notably, with only 1/3 the data and half the training epochs, GRAPE enables LLaMA-1-8B to exceed Tulu3-SFT performance by 3. 5%. Our findings highlight that aligning supervision with the pretrained distribution provides a simple yet powerful strategy to improve both the efficiency and effectiveness of SFT.

NeurIPS Conference 2025 Conference Paper

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

  • Shivam Agarwal
  • Zimin Zhang
  • Lifan Yuan
  • Jiawei Han
  • Hao Peng

Entropy minimization (EM) trains the model to concentrate even more probability mass on its most confident outputs. We show that this simple objective alone, without any labeled data, can substantially improve large language models’ (LLMs) performance on challenging math, physics, and coding tasks. We explore three approaches: (1) EM-FT minimizes token-level entropy similarly to instruction finetuning, but on unlabeled outputs drawn from the model; (2) EM-RL: reinforcement learning with negative entropy as the only reward to maximize; (3) EM-INF: inference-time logit adjustment to reduce entropy without any training data or parameter updates. On Qwen-7B, EM-RL, without any labeled data, achieves comparable or better performance than strong RL baselines such as GRPO and RLOO that are trained on 60K labeled examples. Furthermore, EM-INF enables Qwen-32B to match or exceed the performance of proprietary models like GPT-4o, Claude 3 Opus, and Gemini 1. 5 Pro on the challenging SciCode benchmark, while being 3x more efficient than self-consistency and sequential refinement. Our findings reveal that many pretrained LLMs possess previously underappreciated reasoning capabilities that can be effectively elicited through entropy minimization alone, without any labeled data or even any parameter updates.

AAAI Conference 2025 Conference Paper

Towards Effective, Efficient and Unsupervised Social Event Detection in the Hyperbolic Space

  • Xiaoyan Yu
  • Yifan Wei
  • Shuaishuai Zhou
  • Zhiwei Yang
  • Li Sun
  • Hao Peng
  • Liehuang Zhu
  • Philip S. Yu

The vast, complex, and dynamic nature of social message data has posed challenges to social event detection (SED). Despite considerable effort, these challenges persist, often resulting in inadequately expressive message representations (ineffective) and prolonged learning durations (inefficient). In response to the challenges, this work introduces an unsupervised framework, HyperSED (Hyperbolic SED). Specifically, the proposed framework first models social messages into semantic-based message anchors, and then leverages the structure of the anchor graph and the expressiveness of the hyperbolic space to acquire structure- and geometry-aware anchor representations. Finally, HyperSED builds the partitioning tree of the anchor message graph by incorporating differentiable structural information as the reflection of the detected events. Extensive experiments on public datasets demonstrate HyperSED's competitive performance, along with a substantial improvement in efficiency compared to the current state-of-the-art unsupervised paradigm. Statistically, HyperSED boosts incremental SED by an average of 2%, 2%, and 25% in NMI, AMI, and ARI, respectively; enhancing efficiency by up to 37.41 times and at least 12.10 times, illustrating the advancement of the proposed framework.

TMLR Journal 2024 Journal Article

A Single Transformer for Scalable Vision-Language Modeling

  • Yangyi Chen
  • Xingyao Wang
  • Hao Peng
  • Heng Ji

We present SOLO, a single transformer for Scalable visiOn-Language mOdeling. Current large vision-language models (LVLMs) such as LLaVA mostly employ heterogeneous architectures that connect pre-trained visual encoders with large language models (LLMs) to facilitate visual recognition and complex reasoning. Although achieving remarkable performance with relatively lightweight training, we identify four primary scalability limitations: (1) The visual capacity is constrained by pre-trained visual encoders, which are typically an order of magnitude smaller than LLMs. (2) The heterogeneous architecture complicates the use of established hardware and software infrastructure. (3) Study of scaling laws on such architecture must consider three separate components — visual encoder, connector, and LLMs, which complicates the analysis. (4) The use of existing visual encoders typically requires following a pre-defined specification of image inputs pre-processing, for example, by reshaping inputs to fixed-resolution square images. This inflexibility can create bottlenecks and impede scalability. A unified single Transformer architecture, like \approach, effectively addresses these scalability concerns in LVLMs; however, its limited adoption in the modern context likely stems from the absence of reliable training recipes that balance both modalities and ensure stable training for billion-scale models. In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM with the single Transformer architecture using moderate academic resources (8 x A100 80GB GPUs). The training recipe involves initializing from LLMs, sequential pre-training on ImageNet and web-scale data, and instruction fine-tuning on our curated high-quality datasets. On extensive evaluation, SOLO demonstrates performance comparable to LLaVA-v1.5-7B, particularly excelling in visual mathematical reasoning.

AAAI Conference 2024 Conference Paper

Adversarial Socialbots Modeling Based on Structural Information Principles

  • Xianghua Zeng
  • Hao Peng
  • Angsheng Li

The importance of effective detection is underscored by the fact that socialbots imitate human behavior to propagate misinformation, leading to an ongoing competition between socialbots and detectors. Despite the rapid advancement of reactive detectors, the exploration of adversarial socialbot modeling remains incomplete, significantly hindering the development of proactive detectors. To address this issue, we propose a mathematical Structural Information principles-based Adversarial Socialbots Modeling framework, namely SIASM, to enable more accurate and effective modeling of adversarial behaviors. First, a heterogeneous graph is presented to integrate various users and rich activities in the original social network and measure its dynamic uncertainty as structural entropy. By minimizing the high-dimensional structural entropy, a hierarchical community structure of the social network is generated and referred to as the optimal encoding tree. Secondly, a novel method is designed to quantify influence by utilizing the assigned structural entropy, which helps reduce the computational cost of SIASM by filtering out uninfluential users. Besides, a new conditional structural entropy is defined between the socialbot and other users to guide the follower selection for network influence maximization. Extensive and comparative experiments on both homogeneous and heterogeneous social networks demonstrate that, compared with state-of-the-art baselines, the proposed SIASM framework yields substantial performance improvements in terms of network influence (up to 16.32%) and sustainable stealthiness (up to 16.29%) when evaluated against a robust detector with 90% accuracy.

AAAI Conference 2024 Conference Paper

AMD: Autoregressive Motion Diffusion

  • Bo Han
  • Hao Peng
  • Minjing Dong
  • Yi Ren
  • Yixuan Shen
  • Chang Xu

Human motion generation aims to produce plausible human motion sequences according to various conditional inputs, such as text or audio. Despite the feasibility of existing methods in generating motion based on short prompts and simple motion patterns, they encounter difficulties when dealing with long prompts or complex motions. The challenges are two-fold: 1) the scarcity of human motion-captured data for long prompts and complex motions. 2) the high diversity of human motions in the temporal domain and the substantial divergence of distributions from conditional modalities, leading to a many-to-many mapping problem when generating motion with complex and long texts. In this work, we address these gaps by 1) elaborating the first dataset pairing long textual descriptions and 3D complex motions (HumanLong3D), and 2) proposing an autoregressive motion diffusion model (AMD). Specifically, AMD integrates the text prompt at the current timestep with the text prompt and action sequences at the previous timestep as conditional information to predict the current action sequences in an iterative manner. Furthermore, we present its generalization for X-to-Motion with “No Modality Left Behind”, enabling for the first time the generation of high-definition and high-fidelity human motions based on user-defined modality input.

IJCAI Conference 2024 Conference Paper

Dynamicity-aware Social Bot Detection with Dynamic Graph Transformers

  • Buyun He
  • Yingguang Yang
  • Qi Wu
  • Hao Liu
  • Renyu Yang
  • Hao Peng
  • Xiang Wang
  • Yong Liao

Detecting social bots has evolved into a pivotal yet intricate task, aimed at combating the dissemination of misinformation and preserving the authenticity of online interactions. While earlier graph-based approaches, which leverage topological structure of social networks, yielded notable outcomes, they overlooked the inherent dynamicity of social networks -- In reality, they largely depicted the social network as a static graph and solely relied on its most recent state. Due to the absence of dynamicity modeling, such approaches are vulnerable to evasion, particularly when advanced social bots interact with other users to camouflage identities and escape detection. To tackle these challenges, we propose BotDGT, a novel framework that not only considers the topological structure, but also effectively incorporates dynamic nature of social network. Specifically, we characterize a social network as a dynamic graph. A structural module is employed to acquire topological information from each historical snapshot. Additionally, a temporal module is proposed to integrate historical context and model the evolving behavior patterns exhibited by social bots and legitimate users. Experimental results demonstrate the superiority of BotDGT against the leading methods that neglected the dynamic nature of social networks in terms of accuracy, recall, and F1-score.

NeurIPS Conference 2024 Conference Paper

Effective Exploration Based on the Structural Information Principles

  • Xianghua Zeng
  • Hao Peng
  • Angsheng Li

Traditional information theory provides a valuable foundation for Reinforcement Learning (RL), particularly through representation learning and entropy maximiza tion for agent exploration. However, existing methods primarily concentrate on modeling the uncertainty associated with RL’s random variables, neglecting the in herent structure within the state and action spaces. In this paper, we propose a novel Structural Information principles-based Effective Exploration framework, namely SI2E. Structural mutual information between two variables is defined to address the single-variable limitation in structural information, and an innovative embedding principle is presented to capture dynamics-relevant state-action representations. The SI2E analyzes value differences in the agent’s policy between state-action pairs and minimizes structural entropy to derive the hierarchical state-action struc ture, referred to as the encoding tree. Under this tree structure, value-conditional structural entropy is defined and maximized to design an intrinsic reward mechanism that avoids redundant transitions and promotes enhanced coverage in the state-action space. Theoretical connections are established between SI2E and classical information-theoretic methodologies, highlighting our framework’s rationality and advantage. Comprehensive evaluations in the MiniGrid, MetaWorld, and DeepMind Control Suite benchmarks demonstrate that SI2E significantly outperforms state-of-the-art exploration baselines regarding final performance and sample efficiency, with maximum improvements of 37. 63% and 60. 25%, respectively.

NeurIPS Conference 2024 Conference Paper

GC-Bench: An Open and Unified Benchmark for Graph Condensation

  • Qingyun Sun
  • Ziying Chen
  • Beining Yang
  • Cheng Ji
  • Xingcheng Fu
  • Sheng Zhou
  • Hao Peng
  • Jianxin Li

Graph condensation (GC) has recently garnered considerable attention due to its ability to reduce large-scale graph datasets while preserving their essential properties. The core concept of GC is to create a smaller, more manageable graph that retains the characteristics of the original graph. Despite the proliferation of graph condensation methods developed in recent years, there is no comprehensive evaluation and in-depth analysis, which creates a great obstacle to understanding the progress in this field. To fill this gap, we develop a comprehensive Graph Condensation Benchmark (GC-Bench) to analyze the performance of graph condensation in different scenarios systematically. Specifically, GC-Bench systematically investigates the characteristics of graph condensation in terms of the following dimensions: effectiveness, transferability, and complexity. We comprehensively evaluate 12 state-of-the-art graph condensation algorithms in node-level and graph-level tasks and analyze their performance in 12 diverse graph datasets. Further, we have developed an easy-to-use library for training and evaluating different GC methods to facilitate reproducible research. The GC-Bench library is available at https: //github. com/RingBDStack/GC-Bench.

AAAI Conference 2024 Conference Paper

Hierarchical and Incremental Structural Entropy Minimization for Unsupervised Social Event Detection

  • Yuwei Cao
  • Hao Peng
  • Zhengtao Yu
  • Philip S. Yu

As a trending approach for social event detection, graph neural network (GNN)-based methods enable a fusion of natural language semantics and the complex social network structural information, thus showing SOTA performance. However, GNN-based methods can miss useful message correlations. Moreover, they require manual labeling for training and predetermining the number of events for prediction. In this work, we address social event detection via graph structural entropy (SE) minimization. While keeping the merits of the GNN-based methods, the proposed framework, HISEvent, constructs more informative message graphs, is unsupervised, and does not require the number of events given a priori. Specifically, we incrementally explore the graph neighborhoods using 1-dimensional (1D) SE minimization to supplement the existing message graph with edges between semantically related messages. We then detect events from the message graph by hierarchically minimizing 2-dimensional (2D) SE. Our proposed 1D and 2D SE minimization algorithms are customized for social event detection and effectively tackle the efficiency problem of the existing SE minimization algorithms. Extensive experiments show that HISEvent consistently outperforms GNN-based methods and achieves the new SOTA for social event detection under both closed- and open-set settings while being efficient and robust.

AIJ Journal 2024 Journal Article

Incremental measurement of structural entropy for dynamic graphs

  • Runze Yang
  • Hao Peng
  • Chunyang Liu
  • Angsheng Li

Structural entropy is a metric that measures the amount of information embedded in graph structure data under a strategy of hierarchical abstracting. To measure the structural entropy of a dynamic graph, we need to decode the optimal encoding tree corresponding to the best community partitioning for each snapshot. However, the current methods do not support dynamic encoding tree updating and incremental structural entropy computation. To address this issue, we propose Incre-2dSE, a novel incremental measurement framework that dynamically adjusts the community partitioning and efficiently computes the updated structural entropy for each updated graph. Specifically, Incre-2dSE includes incremental algorithms based on two dynamic adjustment strategies for two-dimensional encoding trees, i. e. , the naive adjustment strategy and the node-shifting adjustment strategy, which support theoretical analysis of updated structural entropy and incrementally optimize community partitioning towards a lower structural entropy. We conduct extensive experiments on 3 artificial datasets generated by Hawkes Process and 3 real-world datasets. Experimental results confirm that our incremental algorithms effectively capture the dynamic evolution of the communities, reduce time consumption, and provide great interpretability.

AAAI Conference 2024 Conference Paper

Motif-Aware Riemannian Graph Neural Network with Generative-Contrastive Learning

  • Li Sun
  • Zhenhao Huang
  • Zixi Wang
  • Feiyang Wang
  • Hao Peng
  • Philip S. Yu

Graphs are typical non-Euclidean data of complex structures. In recent years, Riemannian graph representation learning has emerged as an exciting alternative to Euclidean ones. However, Riemannian methods are still in an early stage: most of them present a single curvature (radius) regardless of structural complexity, suffer from numerical instability due to the exponential/logarithmic map, and lack the ability to capture motif regularity. In light of the issues above, we propose the problem of Motif-aware Riemannian Graph Representation Learning, seeking a numerically stable encoder to capture motif regularity in a diverse-curvature manifold without labels. To this end, we present a novel Motif-aware Riemannian model with Generative-Contrastive learning (MotifRGC), which conducts a minmax game in Riemannian manifold in a self-supervised manner. First, we propose a new type of Riemannian GCN (D-GCN), in which we construct a diverse-curvature manifold by a product layer with the diversified factor, and replace the exponential/logarithmic map by a stable kernel layer. Second, we introduce a motif-aware Riemannian generative-contrastive learning to capture motif regularity in the constructed manifold and learn motif-aware node representation without external labels. Empirical results show the superiority of MofitRGC.

AAAI Conference 2024 Conference Paper

Poincaré Differential Privacy for Hierarchy-Aware Graph Embedding

  • Yuecen Wei
  • Haonan Yuan
  • Xingcheng Fu
  • Qingyun Sun
  • Hao Peng
  • Xianxian Li
  • Chunming Hu

Hierarchy is an important and commonly observed topological property in real-world graphs that indicate the relationships between supervisors and subordinates or the organizational behavior of human groups. As hierarchy is introduced as a new inductive bias into the Graph Neural Networks (GNNs) in various tasks, it implies latent topological relations for attackers to improve their inference attack performance, leading to serious privacy leakage issues. In addition, existing privacy-preserving frameworks suffer from reduced protection ability in hierarchical propagation due to the deficiency of adaptive upper-bound estimation of the hierarchical perturbation boundary. It is of great urgency to effectively leverage the hierarchical property of data while satisfying privacy guarantees. To solve the problem, we propose the Poincar\'e Differential Privacy framework, named PoinDP, to protect the hierarchy-aware graph embedding based on hyperbolic geometry. Specifically, PoinDP first learns the hierarchy weights for each entity based on the Poincar\'e model in hyperbolic space. Then, the Personalized Hierarchy-aware Sensitivity is designed to measure the sensitivity of the hierarchical structure and adaptively allocate the privacy protection strength. Besides, Hyperbolic Gaussian Mechanism (HGM) is proposed to extend the Gaussian mechanism in Euclidean space to hyperbolic space to realize random perturbations that satisfy differential privacy under the hyperbolic space metric. Extensive experiment results on five real-world datasets demonstrate the proposed PoinDP’s advantages of effective privacy protection while maintaining good performance on the node classification task.

JMLR Journal 2024 Journal Article

PyGOD: A Python Library for Graph Outlier Detection

  • Kay Liu
  • Yingtong Dou
  • Xueying Ding
  • Xiyang Hu
  • Ruitong Zhang
  • Hao Peng
  • Lichao Sun
  • Philip S. Yu

PyGOD is an open-source Python library for detecting outliers in graph data. As the first comprehensive library of its kind, PyGOD supports a wide array of leading graph-based methods for outlier detection under an easy-to-use, well-documented API designed for use by both researchers and practitioners. PyGOD provides modularized components of the different detectors implemented so that users can easily customize each detector for their purposes. To ease the construction of detection workflows, PyGOD offers numerous commonly used utility functions. To scale computation to large graphs, PyGOD supports functionalities for deep models such as sampling and mini-batch processing. PyGOD uses best practices in fostering code reliability and maintainability, including unit testing, continuous integration, and code coverage. To facilitate accessibility, PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI). [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

ReGCL: Rethinking Message Passing in Graph Contrastive Learning

  • Cheng Ji
  • Zixuan Huang
  • Qingyun Sun
  • Hao Peng
  • Xingcheng Fu
  • Qian Li
  • Jianxin Li

Graph contrastive learning (GCL) has demonstrated remarkable efficacy in graph representation learning. However, previous studies have overlooked the inherent conflict that arises when employing graph neural networks (GNNs) as encoders for node-level contrastive learning. This conflict pertains to the partial incongruity between the feature aggregation mechanism of graph neural networks and the embedding distinction characteristic of contrastive learning. Theoretically, to investigate the location and extent of the conflict, we analyze the participation of message-passing from the gradient perspective of InfoNCE loss. Different from contrastive learning in other domains, the conflict in GCL arises due to the presence of certain samples that contribute to both the gradients of positive and negative simultaneously under the manner of message passing, which are opposite optimization directions. To further address the conflict issue, we propose a practical framework called ReGCL, which utilizes theoretical findings of GCL gradients to effectively improve graph contrastive learning. Specifically, two gradient-based strategies are devised in terms of both message passing and loss function to mitigate the conflict. Firstly, a gradient-guided structure learning method is proposed in order to acquire a structure that is adapted to contrastive learning principles. Secondly, a gradient-weighted InfoNCE loss function is designed to reduce the impact of false negative samples with high probabilities, specifically from the standpoint of the graph encoder. Extensive experiments demonstrate the superiority of the proposed method in comparison to state-of-the-art baselines across various node classification benchmarks.

NeurIPS Conference 2024 Conference Paper

SciCode: A Research Coding Benchmark Curated by Scientists

  • Minyang Tian
  • Luyu Gao
  • Shizhuo D. Zhang
  • Xinan Chen
  • Cunwei Fan
  • Xuefei Guo
  • Roland Haas
  • Pan Ji

Since language models (LMs) now outperform average humans on many challenging tasks, it is becoming increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this by examining LM capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we create a scientist-curated coding benchmark, SciCode. The problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. OpenAI o1-preview, the best-performing model among those tested, can solve only 7. 7\% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards realizing helpful scientific assistants and sheds light on the building and evaluation of scientific AI in the future.

NeurIPS Conference 2024 Conference Paper

Spiking Graph Neural Network on Riemannian Manifolds

  • Li Sun
  • Zhenhao Huang
  • Qiqi Wan
  • Hao Peng
  • Philip S. Yu

Graph neural networks (GNNs) have become the dominant solution for learning on graphs, the typical non-Euclidean structures. Conventional GNNs, constructed with the Artificial Neuron Network (ANN), have achieved impressive performance at the cost of high computation and energy consumption. In parallel, spiking GNNs with brain-like spiking neurons are drawing increasing research attention owing to the energy efficiency. So far, existing spiking GNNs consider graphs in Euclidean space, ignoring the structural geometry, and suffer from the high latency issue due to Back-Propagation-Through-Time (BPTT) with the surrogate gradient. In light of the aforementioned issues, we are devoted to exploring spiking GNN on Riemannian manifolds, and present a Manifold-valued Spiking GNN (MSG). In particular, we design a new spiking neuron on geodesically complete manifolds with the diffeomorphism, so that BPTT regarding the spikes is replaced by the proposed differentiation via manifold. Theoretically, we show that MSG approximates a solver of the manifold ordinary differential equation. Extensive experiments on common graphs show the proposed MSG achieves superior performance to previous spiking GNNs and energy efficiency to conventional GNNs.

IJCAI Conference 2023 Conference Paper

CONGREGATE: Contrastive Graph Clustering in Curvature Spaces

  • Li Sun
  • Feiyang Wang
  • Junda Ye
  • Hao Peng
  • Philip S. Yu

Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising space for geometric clustering. On the other hand, contrastive learning boosts the deep graph clustering but usually struggles in either graph augmentation or hard sample mining. To bridge this gap, we rethink the problem of graph clustering from geometric perspective and, to the best of our knowledge, make the first attempt to introduce a heterogeneous curvature space to graph clustering problem. Correspondingly, we present a novel end-to-end contrastive graph clustering model named CONGREGATE, addressing geometric graph clustering with Ricci curvatures. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space where deep representations are generated via the product of the proposed fully Riemannian graph convolutional nets. Thereafter, we train the graph clusters by an augmentation-free reweighted contrastive approach where we pay more attention to both hard negatives and hard positives in our curvature space. Empirical results on real-world graphs show that our model outperforms the state-of-the-art competitors.

AAAI Conference 2023 Conference Paper

Effective and Stable Role-Based Multi-Agent Collaboration by Structural Information Principles

  • Xianghua Zeng
  • Hao Peng
  • Angsheng Li

Role-based learning is a promising approach to improving the performance of Multi-Agent Reinforcement Learning (MARL). Nevertheless, without manual assistance, current role-based methods cannot guarantee stably discovering a set of roles to effectively decompose a complex task, as they assume either a predefined role structure or practical experience for selecting hyperparameters. In this article, we propose a mathematical Structural Information principles-based Role Discovery method, namely SIRD, and then present a SIRD optimizing MARL framework, namely SR-MARL, for multi-agent collaboration. The SIRD transforms role discovery into a hierarchical action space clustering. Specifically, the SIRD consists of structuralization, sparsification, and optimization modules, where an optimal encoding tree is generated to perform abstracting to discover roles. The SIRD is agnostic to specific MARL algorithms and flexibly integrated with various value function factorization approaches. Empirical evaluations on the StarCraft II micromanagement benchmark demonstrate that, compared with state-of-the-art MARL algorithms, the SR-MARL framework improves the average test win rate by 0.17%, 6.08%, and 3.24%, and reduces the deviation by 16.67%, 30.80%, and 66.30%, under easy, hard, and super hard scenarios.

NeurIPS Conference 2023 Conference Paper

Environment-Aware Dynamic Graph Learning for Out-of-Distribution Generalization

  • Haonan Yuan
  • Qingyun Sun
  • Xingcheng Fu
  • Ziwei Zhang
  • Cheng Ji
  • Hao Peng
  • Jianxin Li

Dynamic graph neural networks (DGNNs) are increasingly pervasive in exploiting spatio-temporal patterns on dynamic graphs. However, existing works fail to generalize under distribution shifts, which are common in real-world scenarios. As the generation of dynamic graphs is heavily influenced by latent environments, investigating their impacts on the out-of-distribution (OOD) generalization is critical. However, it remains unexplored with the following two major challenges: (1) How to properly model and infer the complex environments on dynamic graphs with distribution shifts? (2) How to discover invariant patterns given inferred spatio-temporal environments? To solve these challenges, we propose a novel E nvironment- A ware dynamic G raph LE arning ( EAGLE ) framework for OOD generalization by modeling complex coupled environments and exploiting spatio-temporal invariant patterns. Specifically, we first design the environment-aware EA-DGNN to model environments by multi-channel environments disentangling. Then, we propose an environment instantiation mechanism for environment diversification with inferred distributions. Finally, we discriminate spatio-temporal invariant patterns for out-of-distribution prediction by the invariant pattern recognition mechanism and perform fine-grained causal interventions node-wisely with a mixture of instantiated environment samples. Experiments on real-world and synthetic dynamic graph datasets demonstrate the superiority of our method against state-of-the-art baselines under distribution shifts. To the best of our knowledge, we are the first to study OOD generalization on dynamic graphs from the environment learning perspective.

NeurIPS Conference 2023 Conference Paper

FedFed: Feature Distillation against Data Heterogeneity in Federated Learning

  • Zhiqin Yang
  • Yonggang Zhang
  • Yu Zheng
  • Xinmei Tian
  • Hao Peng
  • Tongliang Liu
  • Bo Han

Federated learning (FL) typically faces data heterogeneity, i. e. , distribution shifting among clients. Sharing clients' information has shown great potentiality in mitigating data heterogeneity, yet incurs a dilemma in preserving privacy and promoting model performance. To alleviate the dilemma, we raise a fundamental question: Is it possible to share partial features in the data to tackle data heterogeneity? In this work, we give an affirmative answer to this question by proposing a novel approach called Fed erated Fe ature d istillation (FedFed). Specifically, FedFed partitions data into performance-sensitive features (i. e. , greatly contributing to model performance) and performance-robust features (i. e. , limitedly contributing to model performance). The performance-sensitive features are globally shared to mitigate data heterogeneity, while the performance-robust features are kept locally. FedFed enables clients to train models over local and shared data. Comprehensive experiments demonstrate the efficacy of FedFed in promoting model performance.

IJCAI Conference 2023 Conference Paper

Hierarchical State Abstraction based on Structural Information Principles

  • Xianghua Zeng
  • Hao Peng
  • Angsheng Li
  • Chunyang Liu
  • Lifang He
  • Philip S. Yu

State abstraction optimizes decision-making by ignoring irrelevant environmental information in reinforcement learning with rich observations. Nevertheless, recent approaches focus on adequate representational capacities resulting in essential information loss, affecting their performances on challenging tasks. In this article, we propose a novel mathematical Structural Information principles-based State Abstraction framework, namely SISA, from the information-theoretic perspective. Specifically, an unsupervised, adaptive hierarchical state clustering method without requiring manual assistance is presented, and meanwhile, an optimal encoding tree is generated. On each non-root tree node, a new aggregation function and condition structural entropy are designed to achieve hierarchical state abstraction and compensate for sampling-induced essential information loss in state abstraction. Empirical evaluations on a visual gridworld domain and six continuous control benchmarks demonstrate that, compared with five SOTA state abstraction approaches, SISA significantly improves mean episode reward and sample efficiency up to 18. 98 and 44. 44%, respectively. Besides, we experimentally show that SISA is a general framework that can be flexibly integrated with different representation-learning objectives to improve their performances further.

AAAI Conference 2023 Conference Paper

Self-Organization Preserved Graph Structure Learning with Principle of Relevant Information

  • Qingyun Sun
  • Jianxin Li
  • Beining Yang
  • Xingcheng Fu
  • Hao Peng
  • Philip S. Yu

Most Graph Neural Networks follow the message-passing paradigm, assuming the observed structure depicts the ground-truth node relationships. However, this fundamental assumption cannot always be satisfied, as real-world graphs are always incomplete, noisy, or redundant. How to reveal the inherent graph structure in a unified way remains under-explored. We proposed PRI-GSL, a Graph Structure Learning framework guided by the Principle of Relevant Information, providing a simple and unified framework for identifying the self-organization and revealing the hidden structure. PRI-GSL learns a structure that contains the most relevant yet least redundant information quantified by von Neumann entropy and Quantum Jensen Shannon divergence. PRI-GSL incorporates the evolution of quantum continuous walk with graph wavelets to encode node structural roles, showing in which way the nodes interplay and self-organize with the graph structure. Extensive experiments demonstrate the superior effectiveness and robustness of PRI-GSL.

AAAI Conference 2023 Conference Paper

Self-Supervised Continual Graph Learning in Adaptive Riemannian Spaces

  • Li Sun
  • Junda Ye
  • Hao Peng
  • Feiyang Wang
  • Philip S. Yu

Continual graph learning routinely finds its role in a variety of real-world applications where the graph data with different tasks come sequentially. Despite the success of prior works, it still faces great challenges. On the one hand, existing methods work with the zero-curvature Euclidean space, and largely ignore the fact that curvature varies over the com- ing graph sequence. On the other hand, continual learners in the literature rely on abundant labels, but labeling graph in practice is particularly hard especially for the continuously emerging graphs on-the-fly. To address the aforementioned challenges, we propose to explore a challenging yet practical problem, the self-supervised continual graph learning in adaptive Riemannian spaces. In this paper, we propose a novel self-supervised Riemannian Graph Continual Learner (RieGrace). In RieGrace, we first design an Adaptive Riemannian GCN (AdaRGCN), a unified GCN coupled with a neural curvature adapter, so that Riemannian space is shaped by the learnt curvature adaptive to each graph. Then, we present a Label-free Lorentz Distillation approach, in which we create teacher-student AdaRGCN for the graph sequence. The student successively performs intra-distillation from itself and inter-distillation from the teacher so as to consolidate knowledge without catastrophic forgetting. In particular, we propose a theoretically grounded Generalized Lorentz Projection for the contrastive distillation in Riemannian space. Extensive experiments on the benchmark datasets show the superiority of RieGrace, and additionally, we investigate on how curvature changes over the graph sequence.

ECAI Conference 2023 Conference Paper

Stock Movement Prediction via Attention-Aware Multi-Order Relation Graph Neural Network

  • Hao Peng
  • Jie Yang

Stock Movement Prediction (SMP) is a challenging task that aims at predicting the future stock price trend of companies in the stock. Recent advances mainly apply the Graph Convolutional Network (GCN) to learn connections among companies for SMP. However, these methods usually ignore the semantics of the specific relations (e. g. , investment and share) between two entities (i. e. , companies and persons) on the market knowledge graph. Meanwhile, considering the long-chain cross-shareholding structures among entities, it is difficult for GCN to obtain high-order neighbor information over long distances. To address these two problems, we present an Attention-aware Multi-order Relation GCN for SMP (AMRGCN-SMP). Specifically, an attention-aware multi-channel aggregation manner achieves the weighted fusion of nodes across multiple semantic channels. Moreover, the dynamic update of the adjacent tensor can fuse the multi-order relation representations and bring more abundant long-chain connections. The experiments on the CSI100E and CSI300E datasets demonstrate that the proposed method achieves state-of-the-art performances compared with the recent advances.

AAAI Conference 2022 Conference Paper

A Self-Supervised Mixed-Curvature Graph Neural Network

  • Li Sun
  • Zhongbao Zhang
  • Junda Ye
  • Hao Peng
  • Jiawei Zhang
  • Sen Su
  • Philip S Yu

Graph representation learning received increasing attentions in recent years. Most of the existing methods ignore the complexity of the graph structures and restrict graphs in a single constant-curvature representation space, which is only suitable to particular kinds of graph structure indeed. Additionally, these methods follow the supervised or semi-supervised learning paradigm, and thereby notably limit their deployment on the unlabeled graphs in real applications. To address these aforementioned limitations, we take the first attempt to study the self-supervised graph representation learning in the mixed-curvature spaces. In this paper, we present a novel Self-Supervised Mixed-Curvature Graph Neural Network (SELFMGNN). To capture the complex graph structures, we construct a mixed-curvature space via the Cartesian product of multiple Riemannian component spaces, and design hierarchical attention mechanisms for learning and fusing graph representations across these component spaces. To enable the self-supervised learning, we propose a novel dual contrastive approach. The constructed mixed-curvature space actually provides multiple Riemannian views for the contrastive learning. We introduce a Riemannian projector to reveal these views, and utilize a well-designed Riemannian discriminator for the single-view and cross-view contrastive learning within and across the Riemannian views. Finally, extensive experiments show that SELFMGNN captures the complex graph structures and outperforms state-of-the-art baselines.

TIST Journal 2022 Journal Article

A Survey on Text Classification: From Traditional to Deep Learning

  • Qian Li
  • Hao Peng
  • Jianxin Li
  • Congying Xia
  • Renyu Yang
  • Lichao Sun
  • Philip S. Yu
  • Lifang He

Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021, focusing on models from traditional models to deep learning. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification. We then discuss each of these categories in detail, dealing with both the technical developments and benchmark datasets that support tests of predictions. A comprehensive comparison between different techniques, as well as identifying the pros and cons of various evaluation metrics are also provided in this survey. Finally, we conclude by summarizing key implications, future research directions, and the challenges facing the research area.

NeurIPS Conference 2022 Conference Paper

BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs

  • Kay Liu
  • Yingtong Dou
  • Yue Zhao
  • Xueying Ding
  • Xiyang Hu
  • Ruitong Zhang
  • Kaize Ding
  • Canyu Chen

Detecting which nodes in graphs are outliers is a relatively new machine learning task with numerous applications. Despite the proliferation of algorithms developed in recent years for this task, there has been no standard comprehensive setting for performance evaluation. Consequently, it has been difficult to understand which methods work well and when under a broad range of settings. To bridge this gap, we present—to the best of our knowledge—the first comprehensive benchmark for unsupervised outlier node detection on static attributed graphs called BOND, with the following highlights. (1) We benchmark the outlier detection performance of 14 methods ranging from classical matrix factorization to the latest graph neural networks. (2) Using nine real datasets, our benchmark assesses how the different detection methods respond to two major types of synthetic outliers and separately to “organic” (real non-synthetic) outliers. (3) Using an existing random graph generation technique, we produce a family of synthetically generated datasets of different graph sizes that enable us to compare the running time and memory usage of the different outlier detection algorithms. Based on our experimental results, we discuss the pros and cons of existing graph outlier detection algorithms, and we highlight opportunities for future research. Importantly, our code is freely available and meant to be easily extendable: https: //github. com/pygod-team/pygod/tree/main/benchmark

NeurIPS Conference 2022 Conference Paper

Dual-discriminative Graph Neural Network for Imbalanced Graph-level Anomaly Detection

  • Ge Zhang
  • Zhenyu Yang
  • Jia Wu
  • Jian Yang
  • Shan Xue
  • Hao Peng
  • Jianlin Su
  • Chuan Zhou

Graph-level anomaly detection aims to distinguish anomalous graphs in a graph dataset from normal graphs. Anomalous graphs represent a very few but essential patterns in the real world. The anomalous property of a graph may be referable to its anomalous attributes of particular nodes and anomalous substructures that refer to a subset of nodes and edges in the graph. In addition, due to the imbalance nature of anomaly problem, anomalous information will be diluted by normal graphs with overwhelming quantities. Various anomaly notions in the attributes and/or substructures and the imbalance nature together make detecting anomalous graphs a non-trivial task. In this paper, we propose a graph neural network for graph-level anomaly detection, namely iGAD. Specifically, an anomalous graph attribute-aware graph convolution and an anomalous graph substructure-aware deep Random Walk Kernel (deep RWK) are welded into a graph neural network to achieve the dual-discriminative ability on anomalous attributes and substructures. Deep RWK in iGAD makes up for the deficiency of graph convolution in distinguishing structural information caused by the simple neighborhood aggregation mechanism. Further, we propose a Point Mutual Information (PMI)-based loss function to target the problems caused by imbalance distributions. PMI-based loss function enables iGAD to capture essential correlation between input graphs and their anomalous/normal properties. We evaluate iGAD on four real-world graph datasets. Extensive experiments demonstrate the superiority of iGAD on the graph-level anomaly detection task.

TIST Journal 2022 Journal Article

Federated Multi-view Learning for Private Medical Data Integration and Analysis

  • Sicong Che
  • Zhaoming Kong
  • Hao Peng
  • Lichao Sun
  • Alex Leow
  • Yong Chen
  • Lifang He

Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in the medical field. Two critical challenges are identified: First, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Second, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this article, we present a generic Federated Multi-view Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-view Learning (V-FedMV) and Horizontal Federated Multi-view Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated that the proposed approach can make full use of multi-view data in a privacy-preserving way, and both V-FedMV and H-FedMV perform better than their single-view and pairwise counterparts. Besides, the framework can be easily adapted to deal with multi-view sequential data. We have developed a sequential model (S-FedMV) that takes sequence of multi-view data as input and demonstrated it experimentally. To the best of our knowledge, this framework is the first to consider both vertical and horizontal diversification in the multi-view setting, as well as their sequential federated learning.

TIST Journal 2022 Journal Article

Federated Social Recommendation with Graph Neural Network

  • Zhiwei Liu
  • Liangwei Yang
  • Ziwei Fan
  • Hao Peng
  • Philip S. Yu

Recommender systems have become prosperous nowadays, designed to predict users’ potential interests in items by learning embeddings. Recent developments of the Graph Neural Networks (GNNs) also provide recommender systems (RSs) with powerful backbones to learn embeddings from a user-item graph. However, only leveraging the user-item interactions suffers from the cold-start issue due to the difficulty in data collection. Hence, current endeavors propose fusing social information with user-item interactions to alleviate it, which is the social recommendation problem. Existing work employs GNNs to aggregate both social links and user-item interactions simultaneously. However, they all require centralized storage of the social links and item interactions of users, which leads to privacy concerns. Additionally, according to strict privacy protection under General Data Protection Regulation, centralized data storage may not be feasible in the future, urging a decentralized framework of social recommendation. As a result, we design a federated learning recommender system for the social recommendation task, which is rather challenging because of its heterogeneity, personalization, and privacy protection requirements. To this end, we devise a novel framework Fe drated So cial recommendation with G raph neural network ( FeSoG ). Firstly, FeSoG adopts relational attention and aggregation to handle heterogeneity. Secondly, FeSoG infers user embeddings using local data to retain personalization. Last but not least, the proposed model employs pseudo-labeling techniques with item sampling to protect the privacy and enhance training. Extensive experiments on three real-world datasets justify the effectiveness of FeSoG in completing social recommendation and privacy protection. We are the first work proposing a federated learning framework for social recommendation to the best of our knowledge.

AAAI Conference 2022 Conference Paper

Graph Structure Learning with Variational Information Bottleneck

  • Qingyun Sun
  • Jianxin Li
  • Hao Peng
  • Jia Wu
  • Xingcheng Fu
  • Cheng Ji
  • Philip S Yu

Graph Neural Networks (GNNs) have shown promising results on a broad spectrum of applications. Most empirical studies of GNNs directly take the observed graph as input, assuming the observed structure perfectly depicts the accurate and complete relations between nodes. However, graphs in the real-world are inevitably noisy or incomplete, which could even exacerbate the quality of graph representations. In this work, we propose a novel Variational Information Bottleneck guided Graph Structure Learning framework, namely VIB-GSL, in the perspective of information theory. VIB-GSL is the first attempt to advance the Information Bottleneck (IB) principle for graph structure learning, providing a more elegant and universal framework for mining underlying task-relevant relations. VIB-GSL learns an informative and compressive graph structure to distill the actionable information for specific downstream tasks. VIB-GSL deduces a variational approximation for irregular graph data to form a tractable IB objective function, which facilitates training stability. Extensive experimental results demonstrate that the superior effectiveness and robustness of VIB-GSL.

AAAI Conference 2021 Conference Paper

Adversarial Directed Graph Embedding

  • Shijie Zhu
  • Jianxin Li
  • Hao Peng
  • Senzhang Wang
  • Lifang He

Node representation learning for directed graphs is critically important to facilitate many graph mining tasks. To capture the directed edges between nodes, existing methods mostly learn two embedding vectors for each node, source vector and target vector. However, these methods learn the source and target vectors separately. For the node with very low indegree or outdegree, the corresponding target vector or source vector cannot be effectively learned. In this paper, we propose a novel Directed Graph embedding framework based on Generative Adversarial Network, called DGGAN. The main idea is to use adversarial mechanisms to deploy a discriminator and two generators that jointly learn each node’s source and target vectors. For a given node, the two generators are trained to generate its fake target and source neighbor nodes from the same underlying distribution, and the discriminator aims to distinguish whether a neighbor node is real or fake. The two generators are formulated into a unified framework and could mutually reinforce each other to learn more robust source and target vectors. Extensive experiments show that DGGAN consistently and significantly outperforms existing state-of-the-art methods across multiple graph mining tasks on directed graphs.

IJCAI Conference 2021 Conference Paper

Graph Entropy Guided Node Embedding Dimension Selection for Graph Neural Networks

  • Gongxu Luo
  • Jianxin Li
  • Hao Peng
  • Carl Yang
  • Lichao Sun
  • Philip S. Yu
  • Lifang He

Graph representation learning has achieved great success in many areas, including e-commerce, chemistry, biology, etc. However, the fundamental problem of choosing the appropriate dimension of node embedding for a given graph still remains unsolved. The commonly used strategies for Node Embedding Dimension Selection (NEDS) based on grid search or empirical knowledge suffer from heavy computation and poor model performance. In this paper, we revisit NEDS from the perspective of minimum entropy principle. Subsequently, we propose a novel Minimum Graph Entropy (MinGE) algorithm for NEDS with graph data. To be specific, MinGE considers both feature entropy and structure entropy on graphs, which are carefully designed according to the characteristics of the rich information in them. The feature entropy, which assumes the embeddings of adjacent nodes to be more similar, connects node features and link topology on graphs. The structure entropy takes the normalized degree as basic unit to further measure the higher-order structure of graphs. Based on them, we design MinGE to directly calculate the ideal node embedding dimension for any graph. Finally, comprehensive experiments with popular Graph Neural Networks (GNNs) on benchmark datasets demonstrate the effectiveness and generalizability of our proposed MinGE.

AAAI Conference 2021 Conference Paper

Hyperbolic Variational Graph Neural Network for Modeling Dynamic Graphs

  • Li Sun
  • Zhongbao Zhang
  • Jiawei Zhang
  • Feiyang Wang
  • Hao Peng
  • Sen Su
  • Philip S. Yu

Learning representations for graphs plays a critical role in a wide spectrum of downstream applications. In this paper, we summarize the limitations of the prior works in three folds: representation space, modeling dynamics and modeling uncertainty. To bridge this gap, we propose to learn dynamic graph representation in hyperbolic space, for the first time, which aims to infer stochastic node representations. Working with hyperbolic space, we present a novel Hyperbolic Variational Graph Neural Network, referred to as HVGNN. In particular, to model the dynamics, we introduce a Temporal GNN (TGNN) based on a theoretically grounded time encoding approach. To model the uncertainty, we devise a hyperbolic graph variational autoencoder built upon the proposed TGNN to generate stochastic node representations of hyperbolic normal distributions. Furthermore, we introduce a reparameterisable sampling algorithm for the hyperbolic normal distribution to enable the gradient-based learning of HVGNN. Extensive experiments show that HVGNN outperforms stateof-the-art baselines on real-world datasets.

AAAI Conference 2021 Conference Paper

KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning

  • Ye Liu
  • Yao Wan
  • Lifang He
  • Hao Peng
  • Philip S. Yu

Generative commonsense reasoning which aims to empower machines to generate sentences with the capacity of reasoning over a set of concepts is a critical bottleneck for text generation. Even the state-of-the-art pre-trained language generation models struggle at this task and often produce implausible and anomalous sentences. One reason is that they rarely consider incorporating the knowledge graph which can provide rich relational information among the commonsense concepts. To promote the ability of commonsense reasoning for text generation, we propose a novel knowledge graphaugmented pre-trained language generation model KG-BART, which encompasses the complex relations of concepts through the knowledge graph and produces more logical and natural sentences as output. Moreover, KG-BART can leverage the graph attention to aggregate the rich concept semantics that enhances the model generalization on unseen concept sets. Experiments on benchmark CommonGen dataset verify the effectiveness of our proposed approach by comparing with several strong pre-trained language generation models, particularly KG-BART outperforms BART by 5. 80, 4. 60, in terms of BLEU-3, 4. Moreover, we also show that the generated context by our model can work as background scenarios to benefit downstream commonsense QA tasks. 1

TIST Journal 2021 Journal Article

POLLA: Enhancing the Local Structure Awareness in Long Sequence Spatial-temporal Modeling

  • Haoyi Zhou
  • Hao Peng
  • Jieqi Peng
  • Shuai Zhang
  • Jianxin Li

The spatial-temporal modeling on long sequences is of great importance in many real-world applications. Recent studies have shown the potential of applying the self-attention mechanism to improve capturing the complex spatial-temporal dependencies. However, the lack of underlying structure information weakens its general performance on long sequence spatial-temporal problem. To overcome this limitation, we proposed a novel method, named the Proximity-aware Long Sequence Learning framework, and apply it to the spatial-temporal forecasting task. The model substitutes the canonical self-attention by leveraging the proximity-aware attention, which enhances local structure clues in building long-range dependencies with a linear approximation of attention scores. The relief adjacency matrix technique can utilize the historical global graph information for consistent proximity learning. Meanwhile, the reduced decoder allows for fast inference in a non-autoregressive manner. Extensive experiments are conducted on five large-scale datasets, which demonstrate that our method achieves state-of-the-art performance and validates the effectiveness brought by local structure information.

JAIR Journal 2021 Journal Article

RWNE: A Scalable Random-Walk based Network Embedding Framework with Personalized Higher-order Proximity Preserved

  • Jianxin Li
  • Cheng Ji
  • Hao Peng
  • Yu He
  • Yangqiu Song
  • Xinmiao Zhang
  • Fanzhang Peng

Higher-order proximity preserved network embedding has attracted increasing attention. In particular, due to the superior scalability, random-walk-based network embedding has also been well developed, which could efficiently explore higher-order neighborhoods via multi-hop random walks. However, despite the success of current random-walk-based methods, most of them are usually not expressive enough to preserve the personalized higher-order proximity and lack a straightforward objective to theoretically articulate what and how network proximity is preserved. In this paper, to address the above issues, we present a general scalable random-walk-based network embedding framework, in which random walk is explicitly incorporated into a sound objective designed theoretically to preserve arbitrary higher-order proximity. Further, we introduce the random walk with restart process into the framework to naturally and effectively achieve personalized-weighted preservation of proximities of different orders. We conduct extensive experiments on several real-world networks and demonstrate that our proposed method consistently and substantially outperforms the state-of-the-art network embedding methods.

IJCAI Conference 2021 Conference Paper

TextGTL: Graph-based Transductive Learning for Semi-supervised Text Classification via Structure-Sensitive Interpolation

  • Chen Li
  • Xutan Peng
  • Hao Peng
  • Jianxin Li
  • Lihong Wang

Compared with traditional sequential learning models, graph-based neural networks exhibit excellent properties when encoding text, such as the capacity of capturing global and local information simultaneously. Especially in the semi-supervised scenario, propagating information along the edge can effectively alleviate the sparsity of labeled data. In this paper, beyond the existing architecture of heterogeneous word-document graphs, for the first time, we investigate how to construct lightweight non-heterogeneous graphs based on different linguistic information to better serve free text representation learning. Then, a novel semi-supervised framework for text classification that refines graph topology under theoretical guidance and shares information across different text graphs, namely Text-oriented Graph-based Transductive Learning (TextGTL), is proposed. TextGTL also performs attribute space interpolation based on dense substructure in graphs to predict low-entropy labels with high-quality feature nodes for data augmentation. To verify the effectiveness of TextGTL, we conduct extensive experiments on various benchmark datasets, observing significant performance gains over conventional heterogeneous graphs. In addition, we also design ablation studies to dive deep into the validity of components in TextTGL.

NeurIPS Conference 2020 Conference Paper

Kalman Filtering Attention for User Behavior Modeling in CTR Prediction

  • Hu Liu
  • Jing Lu
  • Xiwei Zhao
  • Sulong Xu
  • Hao Peng
  • Yutong Liu
  • Zehua Zhang
  • Jian Li

Click-through rate (CTR) prediction is one of the fundamental tasks for e-commerce search engines. As search becomes more personalized, it is necessary to capture the user interest from rich behavior data. Existing user behavior modeling algorithms develop different attention mechanisms to emphasize query-relevant behaviors and suppress irrelevant ones. Despite being extensively studied, these attentions still suffer from two limitations. First, conventional attentions mostly limit the attention field only to a single user's behaviors, which is not suitable in e-commerce where users often hunt for new demands that are irrelevant to any historical behaviors. Second, these attentions are usually biased towards frequent behaviors, which is unreasonable since high frequency does not necessarily indicate great importance. To tackle the two limitations, we propose a novel attention mechanism, termed Kalman Filtering Attention (KFAtt), that considers the weighted pooling in attention as a maximum a posteriori (MAP) estimation. By incorporating a priori, KFAtt resorts to global statistics when few user behaviors are relevant. Moreover, a frequency capping mechanism is incorporated to correct the bias towards frequent behaviors. Offline experiments on both benchmark and a 10 billion scale real production dataset, together with an Online A/B test, show that KFAtt outperforms all compared state-of-the-arts. KFAtt has been deployed in the ranking system of JD. com, one of the largest B2C e-commerce websites in China, serving the main traffic of hundreds of millions of active users.

AAAI Conference 2020 Conference Paper

Motif-Matching Based Subgraph-Level Attentional Convolutional Network for Graph Classification

  • Hao Peng
  • Jianxin Li
  • Qiran Gong
  • Yuanxin Ning
  • Senzhang Wang
  • Lifang He

Graph classification is critically important to many real-world applications that are associated with graph data such as chemical drug analysis and social network mining. Traditional methods usually require feature engineering to extract the graph features that can help discriminate the graphs of different classes. Although recently deep learning based graph embedding approaches are proposed to automatically learn graph features, they mostly use a few vertex arrangements extracted from the graph for feature learning, which may lose some structural information. In this work, we present a novel motif-based attentional graph convolution neural network for graph classification, which can learn more discriminative and richer graph features. Specifically, a motif-matching guided subgraph normalization method is developed to better preserve the spatial information. A novel subgraph-level selfattention network is also proposed to capture the different impacts or weights of different subgraphs. Experimental results on both bioinformatics and social network datasets show that the proposed models significantly improve graph classification performance over both traditional graph kernel methods and recent deep learning approaches.

IJCAI Conference 2019 Conference Paper

Aspect-Based Sentiment Classification with Attentive Neural Turing Machines

  • Qianren Mao
  • Jianxin Li
  • Senzhang Wang
  • Yuanning Zhang
  • Hao Peng
  • Min He
  • Lihong Wang

Aspect-based sentiment classification aims to identify sentiment polarity expressed towards a given opinion target in a sentence. The sentiment polarity of the target is not only highly determined by sentiment semantic context but also correlated with the concerned opinion target. Existing works cannot effectively capture and store the inter-dependence between the opinion target and its context. To solve this issue, we propose a novel model of Attentive Neural Turing Machines (ANTM). Via interactive read-write operations between an external memory storage and a recurrent controller, ANTM can learn the dependable correlation of the opinion target to context and concentrate on crucial sentiment information. Specifically, ANTM separates the information of storage and computation, which extends the capabilities of the controller to learn and store sequential features. The read and write operations enable ANTM to adaptively keep track of the interactive attention history between memory content and controller state. Moreover, we append target entity embeddings into both input and output of the controller in order to augment the integration of target information. We evaluate our model on SemEval2014 dataset which contains reviews of Laptop and Restaurant domains and Twitter review dataset. Experimental results verify that our model achieves state-of-the-art performance on aspect-based sentiment classification.

IJCAI Conference 2019 Conference Paper

Fine-grained Event Categorization with Heterogeneous Graph Convolutional Networks

  • Hao Peng
  • Jianxin Li
  • Qiran Gong
  • Yangqiu Song
  • Yuanxin Ning
  • Kunfeng Lai
  • Philip S. Yu

Events are happening in real-world and real-time, which can be planned and organized occasions involving multiple people and objects. Social media platforms publish a lot of text messages containing public events with comprehensive topics. However, mining social events is challenging due to the heterogeneous event elements in texts and explicit and implicit social network structures. In this paper, we design an event meta-schema to characterize the semantic relatedness of social events and build an event-based heterogeneous information network (HIN) integrating information from external knowledge base, and propose a novel Pairwise Popularity Graph Convolutional Network (PP-GCN) based fine-grained social event categorization model. We propose a Knowledgeable meta-paths Instances based social Event Similarity (KIES) between events and build a weighted adjacent matrix as input to the PP-GCN model. Comprehensive experiments on real data collections are conducted to compare various social event detection and clustering tasks. Experimental results demonstrate that our proposed framework outperforms other alternative social event categorization techniques.

IJCAI Conference 2018 Conference Paper

Time-evolving Text Classification with Deep Neural Networks

  • Yu He
  • Jianxin Li
  • Yangqiu Song
  • Mutian He
  • Hao Peng

Traditional text classification algorithms are based on the assumption that data are independent and identically distributed. However, in most non-stationary scenarios, data may change smoothly due to long-term evolution and short-term fluctuation, which raises new challenges to traditional methods. In this paper, we present the first attempt to explore evolutionary neural network models for time-evolving text classification. We first introduce a simple way to extend arbitrary neural networks to evolutionary learning by using a temporal smoothness framework, and then propose a diachronic propagation framework to incorporate the historical impact into currently learned features through diachronic connections. Experiments on real-world news data demonstrate that our approaches greatly and consistently outperform traditional neural network models in both accuracy and stability.

ICML Conference 2017 Conference Paper

Asynchronous Distributed Variational Gaussian Process for Regression

  • Hao Peng
  • Shandian Zhe
  • Xiao Zhang 0017
  • Yuan (Alan) Qi

Gaussian processes (GPs) are powerful non-parametric function estimators. However, their applications are largely limited by the expensive computational cost of the inference procedures. Existing stochastic or distributed synchronous variational inferences, although have alleviated this issue by scaling up GPs to millions of samples, are still far from satisfactory for real-world large applications, where the data sizes are often orders of magnitudes larger, say, billions. To solve this problem, we propose ADVGP, the first Asynchronous Distributed Variational Gaussian Process inference for regression, on the recent large-scale machine learning platform, PARAMETER SERVER. ADVGP uses a novel, flexible variational framework based on a weight space augmentation, and implements the highly efficient, asynchronous proximal gradient optimization. While maintaining comparable or better predictive performance, ADVGP greatly improves upon the efficiency of the existing variational methods. With ADVGP, we effortlessly scale up GP regression to a real-world application with billions of samples and demonstrate an excellent, superior prediction accuracy to the popular linear models.

AAAI Conference 2017 Conference Paper

Incrementally Learning the Hierarchical Softmax Function for Neural Language Models

  • Hao Peng
  • Jianxin Li
  • Yangqiu Song
  • Yaopeng Liu

Neural network language models (NNLMs) have attracted a lot of attention recently. In this paper, we present a training method that can incrementally train the hierarchical softmax function for NNMLs. We split the cost function to model old and update corpora separately, and factorize the objective function for the hierarchical softmax. Then we provide a new stochastic gradient based method to update all the word vectors and parameters, by comparing the old tree generated based on the old corpus and the new tree generated based on the combined (old and update) corpus. Theoretical analysis shows that the mean square error of the parameter vectors can be bounded by a function of the number of changed words related to the parameter node. Experimental results show that incremental training can save a lot of time. The smaller the update corpus is, the faster the update training process is, where an up to 30 times speedup has been achieved. We also use both word similarity/relatedness tasks and dependency parsing task as our benchmarks to evaluate the correctness of the updated word vectors.

ICML Conference 2016 Conference Paper

A Convolutional Attention Network for Extreme Summarization of Source Code

  • Miltiadis Allamanis
  • Hao Peng
  • Charles Sutton

Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the model’s attention, but previous attentional architectures are not constructed to learn such features specifically. We introduce an attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way. We apply this architecture to the problem of extreme summarization of source code snippets into short, descriptive function name-like summaries. Using those features, the model sequentially generates a summary by marginalizing over two attention mechanisms: one that predicts the next summary token based on the attention weights of the input tokens and another that is able to copy a code token as-is directly into the summary. We demonstrate our convolutional attention neural network’s performance on 10 popular Java projects showing that it achieves better performance compared to previous attentional mechanisms.

IJCAI Conference 2015 Conference Paper

EigenGP: Gaussian Process Models with Adaptive Eigenfunctions

  • Hao Peng
  • Yuan Qi

Gaussian processes (GPs) provide a nonparametric representation of functions. However, classical GP inference suffers from high computational cost for big data. In this paper, we propose a new Bayesian approach, EigenGP, that learns both basis dictionary elements—eigenfunctions of a GP prior—and prior precisions in a sparse finite model. It is well known that, among all orthogonal basis functions, eigenfunctions can provide the most compact representation. Unlike other sparse Bayesian finite models where the basis function has a fixed form, our eigenfunctions live in a reproducing kernel Hilbert space as a finite linear combination of kernel functions. We learn the dictionary elements— eigenfunctions—and the prior precisions over these elements as well as all the other hyperparameters from data by maximizing the model marginal likelihood. We explore computational linear algebra to simplify the gradient computation significantly. Our experimental results demonstrate improved predictive performance of EigenGP over alternative sparse GP methods as well as relevance vector machines.