Author name cluster

Wenxuan Tu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

26 papers

2 author rows

AAAI Conference 2026 Conference Paper

Anchor-Driven Nyström for Deep Graph-Level Clustering

Jiaxin Wang
Wenxuan Tu
Lingren Wang
Jieren Cheng
Yue Yang

Graph-level clustering (GLC), which aims to group entire graphs according to their structural and attribute-based similarities, represents a fundamental yet challenging task in various practical applications. Existing GLC methods primarily fall into two main paradigms: 1) deep graph clustering approaches based on Graph Neural Networks (GNNs), and 2) kernel-based methods that utilize predefined kernels to perform fine-grained structural comparison for clustering. However, GNN-based methods typically learn graph-level representations by aggregating node embeddings through pooling operations, which inevitably leads to substantial information loss and suboptimal clustering performance. In contrast, kernel methods, despite their theoretical expressiveness, suffer from prohibitive computational costs that hinder their scalability to large-scale settings. To solve these issues, we propose a novel graph learning framework named Anchor-driven Nyström for Deep Graph-Level Clustering (ANGC), which computes graph similarity via kernel methods while retaining the scalability of GNNs. Specifically, we first employ GNNs to encode individual graphs into sets of node embeddings. Rather than relying on pooling operations, we compute graph similarities in a kernel space constructed from these embeddings. To enhance both scalability and representational power, we introduce learnable graph Nyström anchors, which support end-to-end optimization and significantly accelerate kernel computations. To further improve the discriminative capability of these anchors, we propose the concept of anchor response discrepancy, that is, the variation in a given anchor’s responses across different samples. By maximizing this discrepancy, the anchors are encouraged to strengthen inter-graph distinctions for better clustering. Extensive experiments demonstrate the effectiveness and superiority of ANGC over existing state-of-the-art methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Causally-Aware Attribute Completion for Incomplete Federated Graph Clustering

Jingxin Liu
Wenxuan Tu
Haotian Wang
Renda Han
Haoyi Li
Junlong Wu
Xiangyan Tang

Node-level federated graph clustering allows multiple unlabeled subgraph holders to collaboratively train on node-level tasks without sharing private information. Existing methods usually assume that the node attributes are complete and have achieved promising progress. However, in the Federated Graph Learning (FGL) scenarios, this assumption is overly strict due to failures in data collection devices. Consequently, most existing FGL frameworks struggle to extract useful features from attribute-incomplete graphs for clustering, yet the issue remains underexplored. To bridge this gap, we propose a causally-aware attribute completion for Incomplete Federated Graph Clustering (IFedGC), which constructs a reliable global causal structure that incorporates clustering-friendly information to guide attribute completion for each subgraph. Specifically, in the attribute completion step, we first construct the causal structure to extract the causal relationships between initialized features, and then upload them to the server. Subsequently, we integrate multiple uploaded causal structures into a global causal one to achieve cross-client attribute completion. Moreover, to support reliable clustering, we first collect the high-confidence cluster centroids from each subgraph using a Graph Neural Network (GNN) model and subsequently aggregate these centroids on the server. The above two steps are seamlessly integrated into a unified FGL framework to obtain a clustering-oriented causal structure, which is sent back to the client to promote high-quality attribute completion for better clustering. Extensive results on five benchmark datasets demonstrate the effectiveness and superiority of IFedGC against its competitors.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Cross-View Progressive Feature Filtering for Multi-View Graph Clustering in Remote Sensing

Bowen Liu
Xin Peng
Wenxuan Tu
Chengyao Wei
Xiangyan Tang
Jieren Cheng
Miao Yu

Multi-view clustering of remote sensing data plays a vital role in Earth observation analysis. Recently, deep graph clustering methods based on contrastive learning have significantly improved feature representation capabilities. However, most existing approaches treat all views equally, neglecting the inherent uniqueness and heterogeneity across views, which often results in two major issues: 1) discriminative features from clustering-friendly views are underexplored; and 2) redundant or noisy information from less informative views can degrade the shared representation. To address these challenges, we propose a novel multi-view graph clustering framework termed CF-MVGC for remote sensing data, which dynamically preserves discriminative features and suppresses redundancy by assessing view affinity. Specifically, we employ a dual-stage representation learning strategy to extract both view-specific discriminative features and cross-view consistent representations. To further exploit and adaptively integrate complementary information across views, we design a progressive feature filtering model that dynamically evaluates view affinity using two novel metrics, i.e., view fidelity index (VFI) and view criticality index (VCI). Based on these assessments, the module adaptively modulates feature update and reset signals, reinforcing informative views while suppressing noisy or redundant ones. Views with high affinity receive strengthened update signals to retain valuable features, while those with low affinity are subjected to enhanced reset operations to eliminate noise and redundancy. The resulting high-quality, discriminative representations lead to improved clustering performance, establishing a positive feedback loop. Experimental results on four benchmark datasets demonstrate the effectiveness and superiority of CF-MVGC against its competitors.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Federated Graph-level Clustering Network with Attribute Inference

Renda Han
Junlong Wu
Wenxuan Tu
Jingxin Liu
Haotian Wang
Jieren Cheng

With the rise of vertical segmentation in real-world data, federated graph-level clustering has gained significant attention in recent years. However, the inherent missing attributes in graph datasets held by certain clients lead to suboptimal local parameter updates and misaligned global parameter consensus. This results in knowledge shifts during negotiation to ultimately impair overall clustering performance. This issue remains largely underexplored in the current advanced research. To bridge this gap, we propose a novel deep learning network called Federated Graph-level Clustering Network with Attribute Inference (FedAI), which utilizes high-confidence prior knowledge from each domain and multi-party collaborative optimization to achieve efficient reasoning of unknown features. Specifically, on the client, high-confidence graph samples are projected into a latent space. We then extract and upload irreversible path digest information and attribute-oriented inference signals from them. On the server, we first identify affinity relationships hierarchically via the improved graph kernel method. We then infer the features of clients lacking node attributes through a prior structure-guide recovery operator, facilitating inter-client knowledge transfer for better clustering. Experimental results on 15 cross-dataset and cross-domain non-IID graph datasets demonstrate that FedAI consistently outperforms existing methods.

PDF Details DOI

AAAI Conference 2026 Conference Paper

FedPKDA: Personalized Federated Learning with Privacy-Preserving Knowledge Dynamic Alignment

Moxuan Zeng
Wenxuan Tu
Yuanyi Chen
Yiying Wang
Miao Yu
Xiangyan Tang
Jieren Cheng

Personalized Federated Learning (PFL), which aims to customize models for each client while preserving data privacy, has become an important research topic in addressing the challenges of data heterogeneity. Existing studies usually enhance the localization of global parameters by injecting local information into the globally shared model. However, these methods focus excessively on the personalized characteristics of individual clients and fail to fully exploit distinctive information across clients, limiting the quality of local models to represent unseen samples well. To address this issue, we propose a novel personalized Federated Privacy-preserving Knowledge Dynamic Alignment (FedPKDA) framework, which ensures data privacy during both the collection of client-side key information and its incorporation into federated model training. Specifically, to ensure data privacy during the cross-client information collection phase, we first conduct feature clipping and add Laplacian noise to the local prototypes extracted from each client. Further, we compute the centroid of the uploaded local prototypes in a latent space and leverage Mahalanobis distance to guide the generation of global prototypes, thereby preserving the semantic contributions from participating clients. Moreover, to boost the personalization of the local model, we dynamically align representations learned by the shared model with both a set of local prototypes and privacy-preserving global prototypes, facilitating effective cross-client knowledge sharing under heterogeneous settings while preserving client-specific characteristics. Extensive experiments on benchmark datasets have verified the superiority of FedPKDA against its competitors.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Personalized Federated Graph-Level Clustering Network

Jingxin Liu
Wenxuan Tu
Renda Han
Junlong Wu
Haotian Wang
Guohui Liu
Xiangyan Tang
Yue Yang

In the federated clustering task, structural heterogeneity across clients inevitably impedes effective multi-source information sharing. To solve this issue, Personalized Federated Learning (PFL) has emerged as a potentially effective solution for image and text clustering. Unlike Euclidean data, graph-structured data exhibits diverse and fragile local patterns, which widely exist in real-world scenarios. Multi-graph data analysis in the federated learning setting is challenging and important, yet remains underexplored. This motivates us to propose a novel PERsonalized Federated graph-lEvel Clustering neTwork (PERFECT), which generates a specialized aggregation strategy for each client by uploading key model parameters and representative samples without sharing private information. Specifically, for each client, we first reconstruct privacy-preserving representative samples in a min-max optimization manner and then upload these samples to the server for subsequent personalized parameter aggregation. On the server, we first extract graph-level embeddings from the uploaded data, and then estimate affinities among multiple learned embeddings to formulate a personalized aggregation strategy for each client. Subsequently, to help each local model better identify the cluster boundaries, we utilize clustering-wise gradient to update the key components in the personalized model parameters from the server. Extensive experimental results have demonstrated the effectiveness and superiority of PERFECT over its competitors.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Federated Graph-Level Clustering Network

Jingxin Liu
Jieren Cheng
Renda Han
Wenxuan Tu
Jiaxin Wang
Xin Peng

Federated graph learning (FGL), which excels in analyzing non-IID graphs as well as protecting data privacy, has recently emerged as a hot topic. Existing FGL methods usually train the client model using labeled data and then collaboratively learn a global model without sharing their local graph data. However, in real-world scenarios, the lack of data annotations impedes the negotiation of multi-source information at the server, leading to sub-optimal feedback to the clients. To address this issue, we propose a novel unsupervised learning framework called Federated Graph-level Clustering Network (FedGCN), which collects the topology-oriented features of non-IID graphs from clients to generate global consensus representations through multi-source clustering structure sharing. Specifically, in the client, we first preserve the prototype features of each cluster from the structure-oriented embedding through clustering and then upload the learned multiple prototypes that are hard to be reconstructed into the raw graph data. In the server, we generate consensus prototypes from multiple condensed structure-oriented signals through Gaussian estimation, which are subsequently transferred to each client to promote the great encoding capacity of the local model for better clustering. Extensive experiments across multiple non-IID graph datasets have demonstrated the effectiveness and superiority of FedGCN against its competitors.

PDF Details DOI

ICML Conference 2025 Conference Paper

Federated Node-Level Clustering Network with Cross-Subgraph Link Mending

Jingxin Liu 0006
Renda Han
Wenxuan Tu
Haotian Wang
Junlong Wu
Jieren Cheng

Subgraphs of a complete graph are usually distributed across multiple devices and can only be accessed locally because the raw data cannot be directly shared. However, existing node-level federated graph learning suffers from at least one of the following issues: 1) heavily relying on labeled graph samples that are difficult to obtain in real-world applications, and 2) partitioning a complete graph into several subgraphs inevitably causes missing links, leading to sub-optimal sample representations. To solve these issues, we propose a novel $\underline{\text{Fed}}$erated $\underline{\text{N}}$ode-level $\underline{\text{C}}$lustering $\underline{\text{N}}$etwork (FedNCN), which mends the destroyed cross-subgraph links using clustering prior knowledge. Specifically, within each client, we first design an MLP-based projector to implicitly preserve key clustering properties of a subgraph in a denoising learning-like manner, and then upload the resultant clustering signals that are hard to reconstruct for subsequent cross-subgraph links restoration. In the server, we maximize the potential affinity between subgraphs stemming from clustering signals by graph similarity estimation and minimize redundant links via the N-Cut criterion. Moreover, we employ a GNN-based generator to learn consensus prototypes from this mended graph, enabling the MLP-GNN joint-optimized learner to enhance data privacy during data transmission and further promote the local model for better clustering. Extensive experiments demonstrate the superiority of FedNCN.

NeurIPS Conference 2025 Conference Paper

FedIGL: Federated Invariant Graph Learning for Non-IID Graphs

Lingren Wang
Wenxuan Tu
Jiaxin Wang
Xiong Wang
Jieren Cheng
Jingxin Liu

Federated Graph Learning (FGL) effectively facilitates cross-domain graph model training by enabling decentralized learning across multiple domains, while ensuring data privacy through local data storage and communication of model updates instead of raw data. Existing approaches usually assume shared generic knowledge (e. g. , prototypes, spectral features) via aggregating local structures statistically to alleviate structural heterogeneity. However, imposing overly strict assumptions about the presumed correlation between structural features and the global objective often fails in generalizing to local tasks, leading to suboptimal performance. To tackle this issue, we propose a Fed erated I nvariant G raph L earning ( FedIGL ) framework based on invariant learning, which effectively disrupts spurious correlations and further mines the invariant factors across different distributions. Specifically, a server-side global model is trained to capture client-agnostic subgraph patterns shared across clients, whereas client-side models specialize in client-specific subgraph patterns. Subsequently, without compromising privacy, we propose a novel Bi-Gradient Regularization strategy that introduces gradient constraints to guide the model in identifying client-agnostic and client-specific subgraph patterns for better graph representations. Extensive experiments on graph-level clustering and classification tasks demonstrate the superiority of FedIGL against its competitors.

NeurIPS Conference 2025 Conference Paper

Hierarchical Shortest-Path Graph Kernel Network

Jiaxin Wang
Wenxuan Tu
Jieren Cheng

Graph kernels have emerged as a fundamental and widely adopted technique in graph machine learning. However, most existing graph kernel methods rely on fixed graph similarity estimation that cannot be directly optimized for task-specific objectives, leading to sub-optimal performance. To address this limitation, we propose a kernel-based learning framework called Hierarchical Shortest-Path Graph Kernel Network HSP-GKN, which seamlessly integrates graph similarity estimation with downstream tasks within a unified optimization framework. Specifically, we design a hierarchical shortest-path graph kernel that efficiently preserves both the semantic and structural information of a given graph by transforming it into hierarchical features used for subsequent neural network learning. Building upon this kernel, we develop a novel end-to-end learning framework that matches hierarchical graph features with learnable $hidden$ graph features to produce a similarity vector. This similarity vector subsequently serves as the graph embedding for end-to-end training, enabling the neural network to learn task-specific representations. Extensive experimental results demonstrate the effectiveness and superiority of the designed kernel and its corresponding learning framework compared to current competitors.

ICML Conference 2025 Conference Paper

Scalable Attribute-Missing Graph Clustering via Neighborhood Differentiation

Yaowen Hu
Wenxuan Tu
Yue Liu 0008
Xinhang Wan
Junyi Yan
Taichun Zhou
Xinwang Liu 0002

Deep graph clustering (DGC), which aims to unsupervisedly separate the nodes in an attribute graph into different clusters, has seen substantial potential in various industrial scenarios like community detection and recommendation. However, the real-world attribute graphs, e. g. , social networks interactions, are usually large-scale and attribute-missing. To solve these two problems, we propose a novel DGC method termed C omplementary M ulti- V iew N eighborhood D ifferentiation ($\textit{CMV-ND}$), which preprocesses graph structural information into multiple views in a complete but non-redundant manner. First, to ensure completeness of the structural information, we propose a recursive neighborhood search that recursively explores the local structure of the graph by completely expanding node neighborhoods across different hop distances. Second, to eliminate the redundancy between neighborhoods at different hops, we introduce a neighborhood differential strategy that ensures no overlapping nodes between the differential hop representations. Then, we construct $K+1$ complementary views from the $K$ differential hop representations and the features of the target node. Last, we apply existing multi-view clustering or DGC methods to the views. Experimental results on six widely used graph datasets demonstrate that CMV-ND significantly improves the performance of various methods.

AAAI Conference 2025 Conference Paper

Structure-Adaptive Multi-View Graph Clustering for Remote Sensing Data

Renxiang Guan
Wenxuan Tu
Siwei Wang
Jiyuan Liu
Dayu Hu
Chang Tang
Yu Feng
Junhong Li

Multi-view clustering (MVC) for remote sensing data is a critical and challenging task in Earth observation. Although recent advances in graph neural network (GNN)-based MVC have shown remarkable success, the most prevalent approaches have two major limitations: 1) heavily relying on a predefined yet fixed graph, which limits the performance of clustering because the large number of indistinguishable background samples contained in remote sensing data would introduce noise information and increase structure heterogeneity; 2) ignoring the effect of confusing samples on cluster structure compactness, which leads to fluffy cluster structure and decrease feature discriminability. To address these issues, we propose a Structure-Adaptive Multi-View Graph Clustering method named SAMVGC on remote sensing data which boosts the structure homogeneity and cluster compactness by adaptively learning the graph and cluster structures, respectively. Concretely, we use the geometric structure within the feature embedding space to refine adjacency matrices. The adjacency matrices are dynamically fused with the previous ones to improve the homogeneity and stability of structure information. Additionally, the samples are separated into two categories, including the central (intra-cluster center samples) and the confusing (inter-cluster boundary samples). On the basis, we deploy the contrastive learning paradigm on the central samples within views and the consistent learning paradigm on the confusing samples between views, improving the cluster compactness and consistency. Finally, we conduct extensive experiments on four benchmarks and achieve promising results, well demonstrating the effectiveness and superiority of the proposed method.

PDF Details DOI

AAAI Conference 2024 Conference Paper

A Non-parametric Graph Clustering Framework for Multi-View Data

Shengju Yu
Siwei Wang
Zhibin Dong
Wenxuan Tu
Suyuan Liu
Zhao Lv
Pan Li
Miao Wang

Multi-view graph clustering (MVGC) derives encouraging grouping results by seamlessly integrating abundant information inside heterogeneous data, and has captured surging focus recently. Nevertheless, the majority of current MVGC works involve at least one hyper-parameter, which not only requires additional efforts for tuning, but also leads to a complicated solving procedure, largely harming the flexibility and scalability of corresponding algorithms. To this end, in the article we are devoted to getting rid of hyper-parameters, and devise a non-parametric graph clustering (NpGC) framework to more practically partition multi-view data. To be specific, we hold that hyper-parameters play a role in balancing error item and regularization item so as to form high-quality clustering representations. Therefore, under without the assistance of hyper-parameters, how to acquire high-quality representations becomes the key. Inspired by this, we adopt two types of anchors, view-related and view-unrelated, to concurrently mine exclusive characteristics and common characteristics among views. Then, all anchors' information is gathered together via a consensus bipartite graph. By such ways, NpGC extracts both complementary and consistent multi-view features, thereby obtaining superior clustering results. Also, linear complexities enable it to handle datasets with over 120000 samples. Numerous experiments reveal NpGC's strong points compared to lots of classical approaches.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Attribute-Missing Graph Clustering Network

Wenxuan Tu
Renxiang Guan
Sihang Zhou
Chuan Ma
Xin Peng
Zhiping Cai
Zhe Liu
Jieren Cheng

Deep clustering with attribute-missing graphs, where only a subset of nodes possesses complete attributes while those of others are missing, is an important yet challenging topic in various practical applications. It has become a prevalent learning paradigm in existing studies to perform data imputation first and subsequently conduct clustering using the imputed information. However, these ``two-stage" methods disconnect the clustering and imputation processes, preventing the model from effectively learning clustering-friendly graph embedding. Furthermore, they are not tailored for clustering tasks, leading to inferior clustering results. To solve these issues, we propose a novel Attribute-Missing Graph Clustering (AMGC) method to alternately promote clustering and imputation in a unified framework, where we iteratively produce the clustering-enhanced nearest neighbor information to conduct the data imputation process and utilize the imputed information to implicitly refine the clustering distribution through model optimization. Specifically, in the imputation step, we take the learned clustering information as imputation prompts to help each attribute-missing sample gather highly correlated features within its clusters for data completion, such that the intra-class compactness can be improved. Moreover, to support reliable clustering, we maximize inter-class separability by conducting cost-efficient dual non-contrastive learning over the imputed latent features, which in turn promotes greater graph encoding capability for clustering sub-network. Extensive experiments on five datasets have verified the superiority of AMGC against competitors.

PDF Details DOI

ICLR Conference 2024 Conference Paper

Deep Temporal Graph Clustering

Meng Liu 0014
Yue Liu 0008
Ke Liang 0006
Wenxuan Tu
Siwei Wang 0001
Sihang Zhou 0001
Xinwang Liu 0002

Deep graph clustering has recently received significant attention due to its ability to enhance the representation learning capabilities of models in unsupervised scenarios. Nevertheless, deep clustering for temporal graphs, which could capture crucial dynamic interaction information, has not been fully explored. It means that in many clustering-oriented real-world scenarios, temporal graphs can only be processed as static graphs. This not only causes the loss of dynamic information but also triggers huge computational consumption. To solve the problem, we propose a general framework for deep Temporal Graph Clustering called TGC, which introduces deep clustering techniques to suit the interaction sequence-based batch-processing pattern of temporal graphs. In addition, we discuss differences between temporal graph clustering and static graph clustering from several levels. To verify the superiority of the proposed framework TGC, we conduct extensive experiments. The experimental results show that temporal graph clustering enables more flexibility in finding a balance between time and space requirements, and our framework can effectively improve the performance of existing temporal graph learning methods. The code is released: https://github.com/MGitHubL/Deep-Temporal-Graph-Clustering.

AAAI Conference 2024 Conference Paper

Hawkes-Enhanced Spatial-Temporal Hypergraph Contrastive Learning Based on Criminal Correlations

Ke Liang
Sihang Zhou
Meng Liu
Yue Liu
Wenxuan Tu
Yi Zhang
Liming Fang
Zhe Liu

Crime prediction is a crucial yet challenging task within urban computing, which benefits public safety and resource optimization. Over the years, various models have been proposed, and spatial-temporal hypergraph learning models have recently shown outstanding performances. However, three correlations underlying crime are ignored, thus hindering the performance of previous models. Specifically, there are two spatial correlations and one temporal correlation, i.e., (1) co-occurrence of different types of crimes (type spatial correlation), (2) the closer to the crime center, the more dangerous it is around the neighborhood area (neighbor spatial correlation), and (3) the closer between two timestamps, the more relevant events are (hawkes temporal correlation). To this end, we propose Hawkes-enhanced Spatial-Temporal Hypergraph Contrastive Learning framework (HCL), which mines the aforementioned correlations via two specific strategies. Concretely, contrastive learning strategies are designed for two spatial correlations, and hawkes process modeling is adopted for temporal correlations. Extensive experiments demonstrate the promising capacities of HCL from four aspects, i.e., superiority, transferability, effectiveness, and sensitivity.

PDF Details DOI

AAAI Conference 2024 Conference Paper

MINES: Message Intercommunication for Inductive Relation Reasoning over Neighbor-Enhanced Subgraphs

Ke Liang
Lingyuan Meng
Sihang Zhou
Wenxuan Tu
Siwei Wang
Yue Liu
Meng Liu
Long Zhao

GraIL and its variants have shown their promising capacities for inductive relation reasoning on knowledge graphs. However, the uni-directional message-passing mechanism hinders such models from exploiting hidden mutual relations between entities in directed graphs. Besides, the enclosing subgraph extraction in most GraIL-based models restricts the model from extracting enough discriminative information for reasoning. Consequently, the expressive ability of these models is limited. To address the problems, we propose a novel GraIL-based framework, termed MINES, by introducing a Message Intercommunication mechanism on the Neighbor-Enhanced Subgraph. Concretely, the message intercommunication mechanism is designed to capture the omitted hidden mutual information. It introduces bi-directed information interactions between connected entities by inserting an undirected/bi-directed GCN layer between uni-directed RGCN layers. Moreover, inspired by the success of involving more neighbors in other graph-based tasks, we extend the neighborhood area beyond the enclosing subgraph to enhance the information collection for inductive relation reasoning. Extensive experiments prove the promising capacity of the proposed MINES from various aspects, especially for the superiority, effectiveness, and transfer ability.

PDF Details DOI

ICML Conference 2024 Conference Paper

Towards Resource-friendly, Extensible and Stable Incomplete Multi-view Clustering

Shengju Yu
Zhibin Dong
Siwei Wang 0001
Xinhang Wan
Yue Liu 0008
Weixuan Liang
Pei Zhang 0008
Wenxuan Tu

Incomplete multi-view clustering (IMVC) methods typically encounter three drawbacks: (1) intense time and/or space overheads; (2) intractable hyper-parameters; (3) non-zero variance results. With these concerns in mind, we give a simple yet effective IMVC scheme, termed as ToRES. Concretely, instead of self-expression affinity, we manage to construct prototype-sample affinity for incomplete data so as to decrease the memory requirements. To eliminate hyper-parameters, besides mining complementary features among views by view-wise prototypes, we also attempt to devise cross-view prototypes to capture consensus features for jointly forming high-quality clustering representation. To avoid the variance, we successfully unify representation learning and clustering operation, and directly optimize the discrete cluster indicators from incomplete data. Then, for the resulting objective function, we provide two equivalent solutions from perspectives of feasible region partitioning and objective transformation. Many results suggest that ToRES exhibits advantages against 20 SOTA algorithms, even in scenarios with a higher ratio of incomplete data.

AAAI Conference 2023 Conference Paper

Cluster-Guided Contrastive Graph Clustering Network

Xihong Yang
Yue Liu
Sihang Zhou
Siwei Wang
Wenxuan Tu
Qun Zheng
Xinwang Liu
Liming Fang

Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms. The code of CCGC is available at https://github.com/xihongyang1999/CCGC on Github.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Hard Sample Aware Network for Contrastive Deep Graph Clustering

Yue Liu
Xihong Yang
Sihang Zhou
Xinwang Liu
Zhen Wang
Ke Liang
Wenxuan Tu
Liang Li

Contrastive deep graph clustering, which aims to divide nodes into disjoint groups via contrastive mechanisms, is a challenging research spot. Among the recent works, hard sample mining-based algorithms have achieved great attention for their promising performance. However, we find that the existing hard sample mining methods have two problems as follows. 1) In the hardness measurement, the important structural information is overlooked for similarity calculation, degrading the representativeness of the selected hard negative samples. 2) Previous works merely focus on the hard negative sample pairs while neglecting the hard positive sample pairs. Nevertheless, samples within the same cluster but with low similarity should also be carefully learned. To solve the problems, we propose a novel contrastive deep graph clustering method dubbed Hard Sample Aware Network (HSAN) by introducing a comprehensive similarity measure criterion and a general dynamic sample weighing strategy. Concretely, in our algorithm, the similarities between samples are calculated by considering both the attribute embeddings and the structure embeddings, better revealing sample relationships and assisting hardness measurement. Moreover, under the guidance of the carefully collected high-confidence clustering information, our proposed weight modulating function will first recognize the positive and negative samples and then dynamically up-weight the hard sample pairs while down-weighting the easy ones. In this way, our method can mine not only the hard negative samples but also the hard positive sample, thus improving the discriminative capability of the samples further. Extensive experiments and analyses demonstrate the superiority and effectiveness of our proposed method. The source code of HSAN is shared at https://github.com/yueliu1999/HSAN and a collection (papers, codes and, datasets) of deep graph clustering is shared at https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering on Github.

PDF Details DOI

NeurIPS Conference 2022 Conference Paper

Align then Fusion: Generalized Large-scale Multi-view Clustering with Anchor Matching Correspondences

Siwei Wang
Xinwang Liu
Suyuan Liu
Jiaqi Jin
Wenxuan Tu
Xinzhong Zhu
En Zhu

Multi-view anchor graph clustering selects representative anchors to avoid full pair-wise similarities and therefore reduce the complexity of graph methods. Although widely applied in large-scale applications, existing approaches do not pay sufficient attention to establishing correct correspondences between the anchor sets across views. To be specific, anchor graphs obtained from different views are not aligned column-wisely. Such an Anchor-Unaligned Problem (AUP) would cause inaccurate graph fusion and degrade the clustering performance. Under multi-view scenarios, generating correct correspondences could be extremely difficult since anchors are not consistent in feature dimensions. To solve this challenging issue, we propose the first study of the generalized and flexible anchor graph fusion framework termed Fast Multi-View Anchor-Correspondence Clustering (FMVACC). Specifically, we show how to find anchor correspondence with both feature and structure information, after which anchor graph fusion is performed column-wisely. Moreover, we theoretically show the connection between FMVACC and existing multi-view late fusion and partial view-aligned clustering, which further demonstrates our generality. Extensive experiments on seven benchmark datasets demonstrate the effectiveness and efficiency of our proposed method. Moreover, the proposed alignment module also shows significant performance improvement applying to existing multi-view anchor graph competitors indicating the importance of anchor alignment. Our code is available at \url{https: //github. com/wangsiwei2010/NeurIPS22-FMVACC}.

IJCAI Conference 2022 Conference Paper

Attributed Graph Clustering with Dual Redundancy Reduction

Lei Gong
Sihang Zhou
Wenxuan Tu
Xinwang Liu

Attributed graph clustering is a basic yet essential method for graph data exploration. Recent efforts over graph contrastive learning have achieved impressive clustering performance. However, we observe that the commonly adopted InfoMax operation tends to capture redundant information, limiting the downstream clustering performance. To this end, we develop a novel method termed attributed graph clustering with dual redundancy reduction (AGC-DRR) to reduce the information redundancy in both input space and latent feature space. Specifically, for the input space redundancy reduction, we introduce an adversarial learning mechanism to adaptively learn a redundant edge-dropping matrix to ensure the diversity of the compared sample pairs. To reduce the redundancy in the latent space, we force the correlation matrix of the cross-augmentation sample embedding to approximate an identity matrix. Consequently, the learned network is forced to be robust against perturbation while discriminative against different samples. Extensive experiments have demonstrated that AGC-DRR outperforms the state-of-the-art clustering methods on most of our benchmarks. The corresponding code is available at https: //github. com/gongleii/AGC-DRR.

PDF Details DOI

AAAI Conference 2022 Conference Paper

Deep Graph Clustering via Dual Correlation Reduction

Yue Liu
Wenxuan Tu
Sihang Zhou
Xinwang Liu
Linxuan Song
Xihong Yang
En Zhu

Deep graph clustering, which aims to reveal the underlying graph structure and divide the nodes into different groups, has attracted intensive attention in recent years. However, we observe that, in the process of node encoding, existing methods suffer from representation collapse which tends to map all data into the same representation. Consequently, the discriminative capability of the node representation is limited, leading to unsatisfied clustering performance. To address this issue, we propose a novel self-supervised deep graph clustering method termed Dual Correlation Reduction Network (DCRN) by reducing information correlation in a dual manner. Specifically, in our method, we first design a siamese network to encode samples. Then by forcing the cross-view sample correlation matrix and cross-view feature correlation matrix to approximate two identity matrices, respectively, we reduce the information correlation in the dual-level, thus improving the discriminative capability of the resulting features. Moreover, in order to alleviate representation collapse caused by over-smoothing in GCN, we introduce a propagation regularization term to enable the network to gain long-distance information with the shallow network structure. Extensive experimental results on six benchmark datasets demonstrate the effectiveness of the proposed DCRN against the existing state-of-the-art methods. The code of DCRN is available at https: //github. com/yueliu1999/DCRN and a collection (papers, codes and, datasets) of deep graph clustering is shared at https: //github. com/yueliu1999/Awesome-Deep- Graph-Clustering on Github.

IJCAI Conference 2022 Conference Paper

Initializing Then Refining: A Simple Graph Attribute Imputation Network

Wenxuan Tu
Sihang Zhou
Xinwang Liu
Yue Liu
Zhiping Cai
En Zhu
Changwang Zhang
Jieren Cheng

Representation learning on the attribute-missing graphs, whose connection information is complete while the attribute information of some nodes is missing, is an important yet challenging task. To impute the missing attributes, existing methods isolate the learning processes of attribute and structure information embeddings, and force both resultant representations to align with a common in-discriminative normal distribution, leading to inaccurate imputation. To tackle these issues, we propose a novel graph-oriented imputation framework called initializing then refining (ITR), where we first employ the structure information for initial imputation, and then leverage observed attribute and structure information to adaptively refine the imputed latent variables. Specifically, we first adopt the structure embeddings of attribute-missing samples as the embedding initialization, and then refine these initial values by aggregating the reliable and informative embeddings of attribute-observed samples according to the affinity structure. Specially, in our refining process, the affinity structure is adaptively updated through iterations by calculating the sample-wise correlations upon the recomposed embeddings. Extensive experiments on four benchmark datasets verify the superiority of ITR against state-of-the-art methods.

PDF Details DOI

AAAI Conference 2021 Conference Paper

Deep Fusion Clustering Network

Wenxuan Tu
Sihang Zhou
Xinwang Liu
Xifeng Guo
Zhiping Cai
En Zhu
Jieren Cheng

Deep clustering is a fundamental yet challenging task for data analysis. Recently we witness a strong tendency of combining autoencoder and graph neural networks to exploit structure information for clustering performance enhancement. However, we observe that existing literature 1) lacks a dynamic fusion mechanism to selectively integrate and refine the information of graph structure and node attributes for consensus representation learning; 2) fails to extract information from both sides for robust target distribution (i. e. , “groundtruth” soft labels) generation. To tackle the above issues, we propose a Deep Fusion Clustering Network (DFCN). Specifically, in our network, an interdependency learning-based Structure and Attribute Information Fusion (SAIF) module is proposed to explicitly merge the representations learned by an autoencoder and a graph autoencoder for consensus representation learning. Also, a reliable target distribution generation measure and a triplet self-supervision strategy, which facilitate cross-modality information exploitation, are designed for network training. Extensive experiments on six benchmark datasets have demonstrated that the proposed DFCN consistently outperforms the state-of-the-art deep clustering methods. Our code is publicly available at https: //github. com/WxTu/DFCN.

ICML Conference 2021 Conference Paper

One Pass Late Fusion Multi-view Clustering

Xinwang Liu 0002
Li Liu 0002
Qing Liao 0001
Siwei Wang 0001
Yi Zhang 0104
Wenxuan Tu
Chang Tang
Jiyuan Liu 0003

Existing late fusion multi-view clustering (LFMVC) optimally integrates a group of pre-specified base partition matrices to learn a consensus one. It is then taken as the input of the widely used k-means to generate the cluster labels. As observed, the learning of the consensus partition matrix and the generation of cluster labels are separately done. These two procedures lack necessary negotiation and can not best serve for each other, which may adversely affect the clustering performance. To address this issue, we propose to unify the aforementioned two learning procedures into a single optimization, in which the consensus partition matrix can better serve for the generation of cluster labels, and the latter is able to guide the learning of the former. To optimize the resultant optimization problem, we develop a four-step alternate algorithm with proved convergence. We theoretically analyze the clustering generalization error of the proposed algorithm on unseen data. Comprehensive experiments on multiple benchmark datasets demonstrate the superiority of our algorithm in terms of both clustering accuracy and computational efficiency. It is expected that the simplicity and effectiveness of our algorithm will make it a good option to be considered for practical multi-view clustering applications.