Arrow Research search

Author name cluster

Zhen Wang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

97 papers
2 author rows

Possible papers

97

AAAI Conference 2026 Conference Paper

Adaptive Theory of Mind for LLM-based Multi-Agent Coordination

  • Chunjiang Mu
  • Ya Zeng
  • Qiaosheng Zhang
  • Kun Shao
  • Chen Chu
  • Hao Guo
  • Danyang Jia
  • Zhen Wang

Theory of Mind (ToM) refers to the ability to reason about others’ mental states, and higher-order ToM involves considering that others also possess their own ToM. Equipping large language model (LLM)-driven agents with ToM has long been considered to improve their coordination in multiagent collaborative tasks. However, we find that misaligned ToM orders—mismatches in the depth of ToM reasoning between agents—can lead to insufficient or excessive reasoning about others, thereby impairing their coordination. To address this issue, we design an adaptive ToM (A-ToM) agent, which can align in ToM orders with its partner. Based on prior interactions, the agent estimates the partner’s likely ToM order and leverages this estimation to predict the partner’s action, thereby facilitating behavioral coordination. We conduct empirical evaluations on four multi-agent coordination tasks: a repeated matrix game, two grid navigation tasks and an Overcooked task. The results validate our findings on ToM alignment and demonstrate the effectiveness of our AToM agent. Furthermore, we discuss the generalizability of our A-ToM to non-LLM-based agents, as well as what would diminish the importance of ToM alignment.

AAAI Conference 2026 Conference Paper

Fine-Grained Generalization via Structuralizing Concept and Feature Space into Commonality, Specificity and Confounding

  • Zhen Wang
  • Jiaojiao Zhao
  • Qilong Wang
  • Yongfeng Dong
  • Wenlong Yu

Fine-Grained Domain Generalization (FGDG) presents greater challenges than conventional domain generalization due to the subtle inter-class differences and relatively pronounced intra-class variations inherent in fine-grained recognition tasks. Under domain shifts, the model becomes overly sensitive to fine-grained cues, leading to the suppression of critical features and a significant drop in performance. Cognitive studies suggest that humans classify objects by leveraging both common and specific attributes, enabling accurate differentiation between fine-grained categories. However, current deep learning models have yet to incorporate this mechanism effectively. Inspired by this mechanism, we propose Concept-Feature Structuralized Generalization (CFSG). This model explicitly disentangles both the concept and feature spaces into three structured components: common, specific, and confounding segments. To mitigate the adverse effects of varying degrees of distribution shift, we introduce an adaptive mechanism that dynamically adjusts the proportions of common, specific, and confounding components. In the final prediction, explicit weights are assigned to each pair of components. Extensive experiments on three single-source benchmark datasets demonstrate that CFSG achieves an average performance improvement of 9.87% over baseline models and outperforms existing state-of-the-art methods by an average of 3.08%. Additionally, explainability analysis validates that CFSG effectively integrates multi-granularity structured knowledge and confirms that feature structuralization facilitates the emergence of concept structuralization.

AAAI Conference 2026 Conference Paper

MoEA-Net: Modality-Incremental Expert Aggregation Network for Retinal Prognostic Prediction

  • Hua Wang
  • Xiaodan Zhang
  • Yanzhao Shi
  • Chengxin Zheng
  • Wanyu Zhang
  • Zhen Wang
  • Jianing Wang
  • Xiaobing Yu

Automated analysis of temporal changes in multimodal retinal images is critical for the prognostic assessment of ophthalmic diseases. Unlike traditional single-timepoint diagnosis, tracking longitudinal changes across multiple imaging modalities introduces significant data bias challenges: (1) Imbalanced modality samples compromise the integration of knowledge within minority modalities; (2) Heterogeneous visual patterns across modalities undermine the perception of disease-relevant biomarkers. To tackle these issues, we propose a Modality-Incremental Expert Aggregation Network (MoEA-Net), which unifies the inter-modal integration and intra-modal perception for enhanced retinal prognostic prediction. Specifically, we employ the large language model (LLM) with incremental LoRA layers for specific modalities to effectively integrate knowledge from imbalanced data. Besides, we introduce a Spatiotemporal-aware Expert (SAE) module to better perceive both the anatomical structures and longitudinal changes within modalities. By progressively combining the SAE module with incremental LoRA, MoEA-Net supports continual knowledge accumulation and improves accurate reasoning. Experimental results show that MoEA-Net achieves state-of-the-art performance on subretinal fluid change and visual recovery classification tasks, validating its effectiveness.

AAAI Conference 2026 Conference Paper

Source-Free Graph Foundation Model Adaptation via Pseudo-Source Reconstruction

  • Liang Yang
  • Hui Ning
  • Jiaming Zhuo
  • Ziyi Ma
  • Chuan Wang
  • Wenning Wu
  • Zhen Wang

Aiming to overcome distribution shift and label sparsity that hinder cross-domain generalization of Graph Neural Networks (GNNs), Unsupervised Graph Domain Adaptation (UGDA) transfers knowledge from a label-rich source to an unlabeled target graph. Yet in practice, strict privacy protocols often withhold the source graph, reducing UGDA to the more constrained Source-Free UGDA (SFUGDA) where only a pre-trained source GNN remains. In this setting, the source GNN serves as a simple, task-specific graph foundation model. Despite recent progress, existing source-free UGDA methods remain hampered by source-knowledge absence: deprived of source graphs, they lose the reference distribution needed to gauge domain shift and must lean on noisy target cues, incurring biased adaptation and catastrophic forgetting. To overcome this drawback, this paper devises Source-Free Graph foundation model Adaptation via pseudo-source Reconstruction (SFGAR), a two-stage SFUGDA framework that first generates pseudo-source graphs to recover the source distribution encoded in a frozen pre-trained GNN, then adversarially aligns these synthetic graphs with the unlabeled target. Theoretical analysis shows that this proxy alignment tightly bounds the target-domain generalization error. Extensive experiments on public benchmarks validate the state-of-the-art performance of SFGAR.

NeurIPS Conference 2025 Conference Paper

A Closer Look at Graph Transformers: Cross-Aggregation and Beyond

  • Jiaming Zhuo
  • Ziyi Ma
  • Yintong Lu
  • Yuwei Liu
  • Kun Fu
  • Di Jin
  • Chuan Wang
  • Wu Wenning

Graph Transformers (GTs), which effectively capture long-range dependencies and structural biases simultaneously, have recently emerged as promising alternatives to traditional Graph Neural Networks (GNNs). Advanced approaches for GTs to leverage topology information involve integrating GNN modules or modulating node attributes using positional encodings. Unfortunately, the underlying mechanism driving their effectiveness remains insufficiently understood. In this paper, we revisit these strategies and uncover a shared underlying mechanism—Cross Aggregation—that effectively captures the interaction between graph topology and node attributes. Building on this insight, we propose the Universal Graph Cross-attention Transformer (UGCFormer), a universal GT framework with linear computational complexity. The idea is to interactively learn the representations of graph topology and node attributes through a linearized Dual Cross-attention (DCA) module. In theory, this module can adaptively capture interactions between these two types of graph information, thereby achieving effective aggregation. To alleviate overfitting arising from the dual-channel design, we introduce a consistency constraint that enforces representational alignment. Extensive evaluations on multiple benchmark datasets demonstrate the effectiveness and efficiency of UGCFormer.

AAAI Conference 2025 Conference Paper

A Unified Model of Direct and Indirect Reciprocity in Multichannel Games

  • Juan Shi
  • Zhaoheng Cao
  • Jinzhuo Liu
  • Chu Chen
  • Zhen Wang

Reciprocity plays a crucial role in maintaining cooperation in human societies and AI systems. In this paper, we focus on reciprocity within multichannel games and examine how cooperation evolves in this context. We propose a unified framework that allows us to evaluate the reputations of interdependent actions across multiple channels while simultaneously exploring both direct and indirect reciprocity mechanisms. We identify partner and semi-partner strategies under both forms of reciprocity, with the former leading to full cooperation and the latter resulting in partial cooperation. Through equilibrium analysis, we characterize the conditions under which full cooperation and partial cooperation emerge. Moreover, we show that when players can link multiple interactions, they learn to coordinate their behavior across different games to maximize overall cooperation. Our findings provide new insights into the maintenance of cooperation across various reciprocity mechanisms and interaction patterns.

NeurIPS Conference 2025 Conference Paper

Adaptive Preference Arithmetic: A Personalized Agent with Adaptive Preference Arithmetic for Dynamic Preference Modeling

  • Hongyi Nie
  • Yaqing Wang
  • Mingyang Zhou
  • Feiyang Pan
  • Quanming Yao
  • Zhen Wang

As large language models (LLMs) are increasingly used as personalized user assistants, effectively adapting to users' evolving preferences is critical for delivering high-quality personalized responses. While user preferences are often stable in content, their relative strengths shift over time due to changing goals and contexts. Therefore, modeling these dynamic preference strengths can enable finer-grained personalization. However, current methods face two major challenges: (i) limited user feedback makes it difficult to estimate preference strengths accurately, and (ii) natural language ambiguity limits the controllability of preference-guided generation. To address these issues, we propose AdaPA-Agent, a LLM-agent personalization framework that models dynamic preference strengths via Adaptive Preference Arithmetic. First, instead of requiring additional user feedback, AdaPA-Agent employs an alignment-based strength estimation module to estimate the strength of user preferences from the existing user-agent interaction. Then, it guides controllable personalized generation by linearly combining next-token distributions, weighted by the estimated strengths of individual preferences. Experiments on two personalization tasks-conversational recommendation and personalized web interaction-demonstrate that AdaPA-Agent better aligning with users' changing intents, and has achieved over 18. 9\% and 14. 2\% improvements compared to ReAct, the widely-used agent framework.

AAAI Conference 2025 Conference Paper

Advancing Retrosynthesis with Retrieval-Augmented Graph Generation

  • Anjie Qiao
  • Zhen Wang
  • Jiahua Rao
  • Yuedong Yang
  • Zhewei Wei

Diffusion-based molecular graph generative models have achieved significant success in template-free, single-step retrosynthesis prediction. However, these models typically generate reactants from scratch, often overlooking the fact that the scaffold of a product molecule typically remains unchanged during chemical reactions. To leverage this useful observation, we introduce a retrieval-augmented molecular graph generation framework. Our framework comprises three key components: a retrieval component that identifies similar molecules for the given product, an integration component that learns valuable clues from these molecules about which part of the product should remain unchanged, and a base generative model that is prompted by these clues to generate the corresponding reactants. We explore various design choices for critical and under-explored aspects of this framework and instantiate it as the Retrieval-Augmented RetroBridge (RARB). RARB demonstrates state-of-the-art performance on standard benchmarks, achieving a 14.8% relative improvement in top-1 accuracy over its base generative model, highlighting the effectiveness of retrieval augmentation. Additionally, RARB excels in handling out-of-distribution molecules, and its advantages remain significant even with smaller models or fewer denoising steps. These strengths make RARB highly valuable for real-world retrosynthesis applications, where extrapolation to novel molecules and high-throughput prediction are essential.

AAAI Conference 2025 Conference Paper

Backdoor Attack on Propagation-based Rumor Detectors

  • Di Jin
  • Yujun Zhang
  • Bingdao Feng
  • Xiaobao Wang
  • Dongxiao He
  • Zhen Wang

Rumor detection is critical as the spread of misinformation on social media threatens social stability. The propagation structure has garnered attention for its ability to capture discriminative information, such as crowd stance, which has led to the development of enhanced detection methods. However, these detectors are vulnerable to attacks that can manipulate results and evade detection, potentially disrupting public order or influencing public opinion. While adversarial attacks on rumor detectors have been studied, the use of backdoor attacks—an evasive and powerful method—remains unexplored due to the challenges in applying them to propagation trees. In this paper, we introduce the first backdoor attack framework against propagation-based rumor detectors, designed to maintain overall detector performance while enabling targeted attacks on specific rumors. We propose an adaptive discrete trigger generator that injects trigger nodes into critical nodes, creating evasive, transferable attacks. Extensive experiments on three real-world rumor datasets demonstrate that our framework effectively undermines the performance of propagation-based rumor detectors and is transferable across different architectures.

AAAI Conference 2025 Conference Paper

Does GCL Need a Large Number of Negative Samples? Enhancing Graph Contrastive Learning with Effective and Efficient Negative Sampling

  • Yongqi Huang
  • Jitao Zhao
  • Dongxiao He
  • Di Jin
  • Yuxiao Huang
  • Zhen Wang

Graph Contrastive Learning (GCL) aims to self-supervised learn low-dimensional graph representations, primarily through instance discrimination, which involves manually mining positive and negative pairs from graphs, increasing the similarity of positive pairs while decreasing negative pairs. Drawing from the success of Contrastive Learning (CL) in other domains, a consensus has been reached that the effectiveness of GCLs depends on a large number of negative pairs. As a result, despite the significant computational overhead, GCLs typically leverage as many negative node pairs as possible to improve model performance. However, given that nodes within a graph are interconnected, we argue that nodes cannot be treated as independent instances. Therefore, we challenge this consensus: Does employing more negative nodes lead to a more effective GCL model? To answer this, we explore the role of negative nodes in the commonly used InfoNCE loss for GCL and observe that: (1) Counterintuitively, a large number of negative nodes can actually hinder the model's ability to distinguish between nodes with different semantics. (2) A smaller number of high-quality and non-topologically coupled negative nodes are sufficient to enhance the discriminability of representations. Based on these findings, we propose a new method called GCL with Effective and Efficient Negative samples, E2Neg, which learns discriminative representations using only a very small set of representative negative samples. E2Neg significantly reduces computational overhead and speeds up model training. We demonstrate the effectiveness and efficiency of E2Neg across multiple datasets compared to other GCL methods.

IJCAI Conference 2025 Conference Paper

Exploiting Self-Refining Normal Graph Structures for Robust Defense against Unsupervised Adversarial Attacks

  • Bingdao Feng
  • Di Jin
  • Xiaobao Wang
  • Dongxiao He
  • Jingyi Cao
  • Zhen Wang

Defending against adversarial attacks on graphs has become increasingly important. Graph refinement to enhance the quality and robustness of representation learning is a critical area that requires thorough investigation. We observe that representations learned from attacked graphs are often ineffective for refinement due to perturbations that cause the endpoints of perturbed edges to become more similar, complicating the defender's ability to distinguish them. To address this challenge, we propose a robust unsupervised graph learning framework that utilizes cleaner graphs to learn effective representations. Specifically, we introduce an anomaly detection model based on contrastive learning to obtain a rough graph excluding a large number of perturbed structures. Subsequently, we then propose the Graph Pollution Degree (GPD), a mutual information-based measure that leverages the encoder's representation capability on the rough graph to assess the trustworthiness of the predicted graph and refine the learned representations. Extensive experiments on four benchmark datasets demonstrate that our method outperforms nine state-of-the-art defense models, effectively defending against adversarial attacks and enhancing node classification performance.

IJCAI Conference 2025 Conference Paper

Good Advisor for Source Localization: Using Large Language Model to Guide the Source Inference Process

  • Dongpeng Hou
  • Wenfei Wei
  • Chao Gao
  • Xianghua Li
  • Zhen Wang

With the rapid development of AI large model technology, large language models (LLMs) provide a new solution for source localization tasks due to the deep linguistic understanding and generation capabilities. However, it is difficult to understand complex propagation patterns and network structures when LLMs are directly applied to source localization, resulting in limited accuracy of source localization. Meanwhile, the high-dimensional embedding of the textual representation introduces significant amounts of redundant features, which also reduces its efficiency in source localization task to some extent. To solve the above problems, this paper proposes a multi-modal fusion framework for rumor source localization, namely Contrastive Rumor Source Localization via LLM (CRSLL), based on the idea of contrastive learning. Specifically, the framework constructs propagation embeddings by comprehensively capturing both propagation dynamics and user profile features, adopts a contrastive learning approach to enhance the representation ability of comment embeddings of rumor cascades by differentiating them from non-rumor cascade comments, filters out invalid features through a differentiable masking strategy, and fuses comment modality embeddings with propagation embeddings through an attention mechanism, so as to better capture the multi-modal data interactions. It is worth mentioning that the framework uses LLM as a good ``advisor'' to provide a rich deep semantic representation, which improves the accuracy of rumor source localization. The code is available at https: //github. com/cgao-comp/CRSLL.

AAAI Conference 2025 Conference Paper

Graph Contrastive Learning with Joint Spectral Augmentation of Attribute and Topology

  • Liang Yang
  • Zhenna Li
  • Jiaming Zhuo
  • Jing Liu
  • Ziyi Ma
  • Chuan Wang
  • Zhen Wang
  • Xiaochun Cao

As an essential technique for Graph Contrastive Learning (GCL), Graph Augmentation (GA) improves the generalization capability of the GCLs by introducing different forms of the same graph. To ensure information integrity, existing GA strategies have been designed to simultaneously process the two types of information available in graphs: node attributes and graph topology. Nonetheless, these strategies tend to augment the two types of graph information separately, ignoring their correlation, resulting in limited representation ability. To overcome this drawback, this paper proposes a novel GCL framework with a Joint spectrAl augMentation, named GCL-JAM. Motivated the equivalence between the graph learning objective on an attribute graph and the spectral clustering objective on the attribute-interpolated graph, the node attributes are first abstracted as another type of node to harmonize the node attributes and graph topology. The newly constructed graph is then utilized to perform spectral augmentation to capture the correlation during augmentation. Theoretically, the proposed joint spectral augmentation is proved to perturb more inter-class edges and noise attributes compared to separate augmentation methods. Extensive experiments on homophily and heterophily graphs validate the effectiveness and universality of GCL-JAM.

IJCAI Conference 2025 Conference Paper

HyperDet: Source Detection in Hypergraphs via Interactive Relationship Construction and Feature-rich Attention Fusion

  • Le Cheng
  • Peican Zhu
  • Yangming Guo
  • Keke Tang
  • Chao Gao
  • Zhen Wang

Hypergraphs offer superior modeling capabilities for social networks, particularly in capturing group phenomena that extend beyond pairwise interactions in rumor propagation. Existing approaches in rumor source detection predominantly focus on dyadic interactions, which inadequately address the complexity of more intricate relational structures. In this study, we present a novel approach for Source Detection in Hypergraphs (HyperDet) via Interactive Relationship Construction and Feature-rich Attention Fusion. Specifically, our methodology employs an Interactive Relationship Construction module to accurately model both the static topology and dynamic interactions among users, followed by the Feature-rich Attention Fusion module, which autonomously learns node features and discriminates between nodes using a self-attention mechanism, thereby effectively learning node representations under the framework of accurately modeled higher-order relationships. Extensive experimental validation confirms the efficacy of our HyperDet approach, showcasing its superiority relative to current state-of-the-art methods.

AAAI Conference 2025 Conference Paper

Learning Complex Heterogeneous Multimodal Fake News via Social Latent Network Inference

  • Mingxin Li
  • Yuchen Zhang
  • Haowei Xu
  • Xianghua Li
  • Chao Gao
  • Zhen Wang

With the diversification of online social platforms, news dissemination has become increasingly complex, heterogeneous, and multimodal, making the fake news detection task more challenging and crucial. Previous works mainly focus on obtaining social relationships of news via retweets, limiting the accurate detection when real cascades are inaccessible. Given the proven assessment of the spreading influence of events, this paper proposes a method called HML (Complex Heterogeneous Multimodal Fake News Detection method via Latent Network Inference). Specifically, an improved social latent network inference strategy is designed to estimate the maximum likelihood of news influences under the same event. Meanwhile, a novel heterogeneous graph is built based on social attributes for multimodal news under different events. Further, to better aggregate the relationships among heterogeneous multimodal features, this paper proposes a self-supervised-based multimodal content learning strategy, to enhance, align, fuse and compare heterogeneous modal contents. Based above, a personalized heterogeneous graph representation learning is designed to classify fake news. Extensive experiments demonstrate that the proposed method outperforms the SOTA in real social media news datasets.

AAAI Conference 2025 Conference Paper

LiON: Learning Point-Wise Abstaining Penalty for LiDAR Outlier DetectioN Using Diverse Synthetic Data

  • Shaocong Xu
  • Pengfei Li
  • Qianpu Sun
  • Xinyu Liu
  • Yang Li
  • Shihui Guo
  • Zhen Wang
  • Bo Jiang

LiDAR-based semantic scene understanding is an important module in the modern autonomous driving perception stack. However, identifying outlier points in a LiDAR point cloud is challenging as LiDAR point clouds lack semantically-rich information. While former SOTA methods adopt heuristic architectures, we revisit this problem from the perspective of Selective Classification, which introduces a selective function into the standard closed-set classification setup. Our solution is built upon the basic idea of abstaining from choosing any inlier categories but learns a point-wise abstaining penalty with a margin-based loss. Apart from learning paradigms, synthesizing outliers to approximate unlimited real outliers is also critical, so we propose a strong synthesis pipeline that generates outliers originated from various factors: object categories, sampling patterns and sizes. We demonstrate that learning different abstaining penalties, apart from point-wise penalty, for different types of (synthesized) outliers can further improve the performance. We benchmark our method on SemanticKITTI and nuScenes and achieve SOTA results.

NeurIPS Conference 2025 Conference Paper

LithoSim: A Large, Holistic Lithography Simulation Benchmark for AI-Driven Semiconductor Manufacturing

  • Hongquan He
  • Zhen Wang
  • Jingya Wang
  • Tao Wu
  • Xuming He
  • Bei Yu
  • Jingyi Yu
  • Hao GENG

Lithography orchestrates a symphony of light, mask and photochemicals to transfer the integrated circuit patterns onto the wafer. Lithography simulation serves as the critical nexus between circuit design and manufacturing, where its speed and accuracy fundamentally govern the optimization quality of downstream resolution enhancement techniques (RET). While machine learning promises to circumvent computational limitations of lithography process through data-driven or physics-informed approximations of computational lithography, existing simulators suffer from inadequate lithographic awareness due to insufficient training data capturing essential process variations and mask correction rules. We present LithoSim, the most comprehensive lithography simulation benchmark to date, featuring over $4$ million high-resolution input-output pairs with rigorous physical correspondence. The dataset systematically incorporates alterable optical source distributions, metal and via mask topologies with optical proximity correction (OPC) variants, and process windows reflecting fab-realistic variations. By integrating domain-specific metrics spanning AI performance and lithographic fidelity, LithoSim establishes a unified evaluation framework for data-driven and physics-informed computational lithography. The data (https: //huggingface. co/datasets/grandiflorum/LithoSim), code (https: //dw-hongquan. github. io/LithoSim), and pre-trained models (https: //huggingface. co/grandiflorum/LithoSim) are released openly to support the development of hybrid ML-based and high-fidelity lithography simulation for the benefit of semiconductor manufacturing.

NeurIPS Conference 2025 Conference Paper

LoSplit: Loss-Guided Dynamic Split for Training-Time Defense Against Graph Backdoor Attacks

  • Di Jin
  • Yuxiang Zhang
  • Bingdao Feng
  • Xiaobao Wang
  • Dongxiao He
  • Zhen Wang

Graph Neural Networks (GNNs) are vulnerable to backdoor attacks. Existing defenses primarily rely on detecting structural anomalies, distributional outliers, or perturbation-induced prediction instability, which struggle to handle the more subtle, feature-based attacks that do not introduce obvious topological changes. Our empirical analysis reveals that both structure-based and feature-based attacks not only cause early loss convergence of target nodes but also induce a class-coherent loss drift, where this early convergence gradually spreads to nearby clean nodes, leading to significant distribution overlap. To address this issue, we propose LoSplit, the first training-time defense framework in graph that leverages this early-stage loss drift to accurately split target nodes. Our method dynamically selects epochs with maximal loss divergence, clusters target nodes via Gaussian Mixture Models (GMM), and applies a Decoupling-Forgetting strategy to break the association between target nodes and malicious label. Extensive experiments on multiple real- world datasets demonstrate the effectiveness of our approach, significantly reducing attack success rates while maintaining high clean accuracy across diverse backdoor attack strategies. Our code is available at: github. com/zyx924768045/LoSplit.

JBHI Journal 2025 Journal Article

Online Self-Distillation and Self-Modeling for 3D Brain Tumor Segmentation

  • Yan Pang
  • Yunhao Li
  • Teng Huang
  • Jiaming Liang
  • Zhen Wang
  • Changyu Dong
  • Dongyang Kuang
  • Ying Hu

In the specialized domain of brain tumor segmentation, supervised segmentation approaches are hindered by the limited availability of high-quality labeled data, a condition arising from data privacy concerns, significant costs, and ethical issues. In response to this challenge, this paper presents a training framework that adeptly integrates a plug-and-play component, MOD, into current supervised learning models, boosting their efficacy in scenarios with limited data. The MOD consists of an Online Tokenizer and a Dense Predictor, which employs self-distillation and self-modeling on masked patches, promoting swift convergence and efficient representation learning. During the inference phase, the plug-and-play MOD component is excluded, preserving the computational efficiency of the original model without incurring extra processing costs. We substantiated the value of our approach through experiments on leading 3D brain tumor segmentation baselines. Remarkably, models augmented with the MOD consistently showcased superior results, achieving elevated Dice coefficients and HD95 scores on two datasets: BraTS 2021 and MSD 2019 Task-01 Brain Tumor.

NeurIPS Conference 2025 Conference Paper

Reinforced Active Learning for Large-Scale Virtual Screening with Learnable Policy Model

  • Yicong Chen
  • Jiahua Rao
  • Jiancong Xie
  • Dahao Xu
  • Zhen Wang
  • Yuedong Yang

Virtual Screening (VS) is vital for drug discovery but struggles with low hit rates and high computational costs. While Active Learning (AL) has shown promise in improving the efficiency of VS, traditional methods rely on inflexible and handcrafted heuristics, limiting adaptability in complex chemical spaces, particularly in balancing molecular diversity and selection accuracy. To overcome these challenges, we propose GLARE, a reinforced active learning framework that reformulates VS as a Markov Decision Process (MDP). Using Group Relative Policy Optimization (GRPO), GLARE dynamically balances chemical diversity, biological relevance, and computational constraints, eliminating the need for inflexible heuristics. Experiments show GLARE outperforms state-of-the-art AL methods, with a 64. 8% average improvement in Enrichment Factors (EF). Additionally, GLARE enhances the performance of VS foundation models like DrugCLIP, achieving up to an 8-fold improvement in EF$_{0. 5\\%}$ with as few as 15 active molecules. These results highlight the transformative potential of GLARE for adaptive and efficient drug discovery.

NeurIPS Conference 2025 Conference Paper

scPilot: Large Language Model Reasoning Toward Automated Single-Cell Analysis and Discovery

  • Yiming Gao
  • Zhen Wang
  • Jefferson Chen
  • Mark Antkowiak
  • Mengzhou Hu
  • JungHo Kong
  • Dexter Pratt
  • Jieyuan Liu

We present scPilot, the first systematic framework to practice \textit{omics-native reasoning}: a large language model (LLM) converses in natural language while directly inspecting single-cell RNA-seq data and on-demand bioinformatics tools. scPilot converts core single-cell analyses, i. e. , cell-type annotation, developmental-trajectory reconstruction, and transcription-factor targeting, into step-by-step reasoning problems that the model must solve, justify, and, when needed, revise with new evidence. To measure progress, we release \scbench, a suite of 9 expertly curated datasets and graders that faithfully evaluate the omics-native reasoning capability of scPilot w. r. t various LLMs. Experiments with o1 show that \textit{iterative} omics-native reasoning lifts average accuracy by 11\% for cell-type annotation and Gemini 2. 5 Pro cuts trajectory graph-edit distance by 30\% versus one-shot prompting, while generating transparent reasoning traces that explain marker gene ambiguity and regulatory logic. By grounding LLMs in raw omics data, scPilot enables auditable, interpretable, and diagnostically informative single-cell analyses.

IJCAI Conference 2025 Conference Paper

Single-Node Trigger Backdoor Attacks in Graph-Based Recommendation Systems

  • Runze Li
  • Di Jin
  • Xiaobao Wang
  • Dongxiao He
  • Bingdao Feng
  • Zhen Wang

Graph recommendation systems have been widely studied due to their ability to effectively capture the complex interactions between users and items. However, these systems also exhibit certain vulnerabilities when faced with attacks. The prevailing shilling attack methods typically manipulate recommendation results by injecting a large number of fake nodes and edges. However, such attack strategies face two primary challenges: low stealth and high destructiveness. To address these challenges, this paper proposes a novel graph backdoor attack method that aims to enhance the exposure of target items to the target user in a covert manner, without affecting other unrelated nodes. Specifically, we design a single-node trigger generator, which can effectively expose multiple target items to the target user by inserting only one fake user node. Additionally, we introduce constraint conditions between the target nodes and irrelevant nodes to mitigate the impact of fake nodes on the recommendation system's performance. Experimental results show that the exposure of the target items reaches no less than 50% in 99% of the target users, while the impact on the recommendation system's performance is controlled within approximately 5%.

IJCAI Conference 2025 Conference Paper

SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs

  • Le Cheng
  • Peican Zhu
  • Yangming Guo
  • Chao Gao
  • Zhen Wang
  • Keke Tang

Source detection on graphs has demonstrated high efficacy in identifying rumor origins. Despite advances in machine learning-based methods, many fail to capture intrinsic dynamics of rumor propagation. In this work, we present SourceDetMamba: A Graph-aware State Space Model for Source Detection in Sequential Hypergraphs, which harnesses the recent success of the state space model Mamba, known for its superior global modeling capabilities and computational efficiency, to address this challenge. Specifically, we first employ hypergraphs to model high-order interactions within social networks. Subsequently, temporal network snapshots generated during the propagation process are sequentially fed in reverse order into Mamba to infer underlying propagation dynamics. Finally, to empower the sequential model to effectively capture propagation patterns while integrating structural information, we propose a novel graph-aware state update mechanism, wherein the state of each node is propagated and refined by both temporal dependencies and topological context. Extensive evaluations on eight datasets demonstrate that SourceDetMamba consistently outperforms state-of-the-art approaches.

AAAI Conference 2025 Conference Paper

Style Nursing with Spatial and Semantic Guidance for Zero-Shot Traffic Scene Style Transfer

  • Zhen Wang
  • Zihang Lin
  • Meng Yuan
  • Yuehu Liu
  • Chi Zhang

Recent advances in text-to-image diffusion models have shown an outstanding ability in zero-shot style transfer. However, existing methods often struggle to balance preserving the semantic content of the input image and faithfully transferring the target style in line with the edit prompt. Especially when applied to complex traffic scenes with diverse objects, layouts, and stylistic variations, current diffusion models tend to exhibit Style Neglection, i.e., failing to generate the required style in the prompt. To address this issue, we propose Style Nursing, which directs the model to focus on style subject tokens in the text prompt and excites their corresponding visual activations. Moreover, we introduce Spatial and Semantic Guidance to guide the preservation of content after editing, which utilizes spatial features from the DDIM sampling process together with attention maps from the semantic reconstruction. To evaluate the performance of zero-shot style transfer methods in traffic scenes, we present STREET-6K, a new benchmark dataset comprising 6000 images showcasing diverse traffic scenes and style transfer variations, accompanied by comprehensive annotations and evaluation metrics. Our approach beats state-of-the-art image translation methods in comprehensive quantitative metrics and human evaluations on traffic scene image synthesis while seamlessly generalizing to various other types of images without training or fine-tuning. Further experiments on detection and segmentation tasks show that fine-tuning perception models on our synthesized images improves Recall and mean Intersection over Union (mIoU) by over 10% and 3% respectively in rarely-seen traffic scenes.

IROS Conference 2025 Conference Paper

TEM 3 -Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving

  • Wenzhuo Liu
  • Yicheng Qiao
  • Zhen Wang
  • Qiannan Guo
  • Zilong Chen
  • Meihua Zhou
  • Xinran Li
  • Letian Wang

Multi-task learning (MTL) can advance assistive driving by exploring inter-task correlations through shared representations. However, existing methods face two critical limitations: single-modality constraints limiting comprehensive scene understanding and inefficient architectures impeding real-time deployment. This paper proposes TEM 3 -Learning (Time-Efficient Multimodal Multi-task Learning), a novel framework that jointly optimizes driver emotion recognition, driver behavior recognition, traffic context recognition, and vehicle behavior recognition through a two-stage architecture. The first component, the mamba-based multi-view temporal-spatial feature extraction subnetwork (MTS-Mamba), introduces a forward-backward temporal scanning mechanism and global-local spatial attention to efficiently extract low-cost temporal-spatial features from multi-view sequential images. The second component, the MTL-based gated multimodal feature integrator (MGMI), employs task-specific multi-gating modules to adaptively highlight the most relevant modality features for each task, effectively alleviating the negative transfer problem in MTL. Evaluation on the AIDE dataset, our proposed model achieves state-of-the-art accuracy across all four tasks, maintaining a lightweight architecture with fewer than 6 million parameters and delivering an impressive 142. 32 FPS inference speed. Rigorous ablation studies further validate the effectiveness of the proposed framework and the independent contributions of each module. The code is available on https://github.com/Wenzhuo-Liu/TEM3-Learning.

AAAI Conference 2025 Conference Paper

Thermal-Aware Low-Light Image Enhancement: A Real-World Benchmark and a New Light-Weight Model

  • Zhen Wang
  • Yaozu Wu
  • Dongyuan Li
  • Shiyin Tan
  • Zhishuai Yin

Enhancing images captured under low-light conditions has been a topic of research for several years. Nonetheless, existing image restoration techniques mainly concentrate on reconstructing images from RGB data, often neglecting the possibility of utilizing additional modalities. With the progress in handheld technology, capturing thermal images with mobile devices has become straightforward. Investigating the integration of thermal data into image restoration presents a valuable research opportunity. Therefore, in this paper, we propose a multimodal low-light image enhancement task based on thermal information and establish a dataset named TLIE (Thermal-aware Low-light Image Enhancement), consisting of 1,113 samples. Each sample in our dataset includes a low-light image, a normal-light image, and the corresponding thermal map. Additionally, based on TLIE dataset, we develop a multimodal approach that simultaneously processes input images and thermal map data to produce the predicted normal-light images. We compare our method with previous unimodal and multimodal state-of-the-art LIE methods, and the experimental results and detailed ablation studies prove the effectiveness of our method.

IJCAI Conference 2025 Conference Paper

Unified Molecule-Text Language Model with Discrete Token Representation

  • Shuhan Guo
  • Yatao Bian
  • Ruibing Wang
  • Nan Yin
  • Zhen Wang
  • Quanming Yao

The remarkable success of Large Language Models (LLMs) across diverse tasks has driven the research community to extend their capabilities to molecular applications. However, most molecular LLMs employ adapter-based architectures that fail to equally integrate molecule and text modalities and lack explicit supervision signals for the molecular modality. To address these issues, we introduce UniMoT, a Unified Molecule-Text LLM adopting a tokenizer-based architecture that expands the vocabulary of LLMs with molecule tokens. Specifically, we introduce a Vector Quantization-driven tokenizer that incorporates a Q-Former to bridge the modality gap between molecule and text. This tokenizer transforms molecular structures into sequences of tokens exhibiting causal dependency, thereby encapsulating both high-level molecular features and textual information. Equipped with this tokenizer, UniMoT unifies molecule and text modalities under a shared token representation and an autoregressive training paradigm. This enables the model to process molecular structures as a distinct linguistic system and generate them in textual form. Through a four-stage training scheme, UniMoT functions as a multi-modal generalist capable of performing both molecule-to-text and text-to-molecule tasks. Extensive experiments demonstrate that UniMoT achieves state-of-the-art performance across a wide range of molecule comprehension and generation tasks.

IJCAI Conference 2025 Conference Paper

Universal Graph Self-Contrastive Learning

  • Liang Yang
  • Yukun Cai
  • Hui Ning
  • Jiaming Zhuo
  • Di Jin
  • Ziyi Ma
  • Yuanfang Guo
  • Chuan Wang

As a pivotal architecture in Self-Supervised Learning (SSL), Graph Contrastive Learning (GCL) has demonstrated substantial application value in scenarios with limited labeled nodes (samples). However, existing GCLs encounter critical issues in the graph augmentation and positive and negative sampling stemming from the lack of explicit supervision, which collectively restrict their efficiency and universality. On the one hand, the reliance on graph augmentations in existing GCLs can lead to increased training times and memory usage, while potentially compromising the semantic integrity. On the other hand, the difficulty in selecting TRUE positive and negative samples for GCLs limits their universality to both homophilic and heterophilic graphs. To address these drawbacks, this paper introduces a novel GCL framework called GRAph learning via Self-contraSt (GRASS). The core mechanism is node-attribute self-contrast, which specifically involves increasing the feature similarities between nodes and their included attributes while decreasing the similarities between nodes and their non-included attributes. Theoretically, the self-contrast mechanism implicitly ensures accurate node-node contrast by capturing high-hop co-inclusion relationships, thereby enabling GRASS to be universally applicable to graphs with varying degrees of homophily. Evaluations on diverse benchmark datasets demonstrate the universality and efficiency of GRASS. The dataset and code are available at URL: https: //github. com/YukunCai/GRASS.

IJCAI Conference 2024 Conference Paper

A General Black-box Adversarial Attack on Graph-based Fake News Detectors

  • Peican Zhu
  • Zechen Pan
  • Yang Liu
  • Jiwei Tian
  • Keke Tang
  • Zhen Wang

Graph Neural Network (GNN)-based fake news detectors apply various methods to construct graphs, aiming to learn distinctive news embeddings for classification. Since the construction details are unknown for attackers in a black-box scenario, it is unrealistic to conduct the classical adversarial attacks that require a specific adjacency matrix. In this paper, we propose the first general black-box adversarial attack framework, i. e. , General Attack via Fake Social Interaction (GAFSI), against detectors based on different graph structures. Specifically, as sharing is an important social interaction for GNN-based fake news detectors to construct the graph, we simulate sharing behaviors to fool the detectors. Firstly, we propose a fraudster selection module to select engaged users leveraging local and global information. In addition, a post injection module guides the selected users to create shared relations by sending posts. The sharing records will be added to the social context, leading to a general attack against different detectors. Experimental results on empirical datasets demonstrate the effectiveness of GAFSI.

IJCAI Conference 2024 Conference Paper

A Successful Strategy for Multichannel Iterated Prisoner’s Dilemma

  • Zhen Wang
  • Zhaoheng Cao
  • Juan Shi
  • Peican Zhu
  • Shuyue Hu
  • Chen Chu

Iterated prisoner’s dilemma (IPD) and its variants are fundamental models for understanding the evolution of cooperation in human society as well as AI systems. In this paper, we focus on multichannel IPD, and examine how an agent should behave to obtain generally high payoffs under this setting. We propose a novel strategy that chooses to cooperate or defect by considering the difference in the cumulative number of defections between two agents. We show that our proposed strategy is nice, retaliatory, and forgiving. Moreover, we analyze the performance of our proposed strategy across different scenarios, including the self-play settings with and without errors, as well as when facing various opponent strategies. In particular, we show that our proposed strategy is invincible and never loses to any opponent strategy in terms of the expected payoff. Last but not least, we empirically validate the evolutionary advantage of our strategy, and demonstrate its potential to serve as a catalyst for cooperation emergence.

NeurIPS Conference 2024 Conference Paper

A Swiss Army Knife for Heterogeneous Federated Learning: Flexible Coupling via Trace Norm

  • Tianchi Liao
  • Lele Fu
  • Jialong Chen
  • Zhen Wang
  • Zibin Zheng
  • Chuan Chen

The heterogeneity issue in federated learning (FL) has attracted increasing attention, which is attempted to be addressed by most existing methods. Currently, due to systems and objectives heterogeneity, enabling clients to hold models of different architectures and tasks of different demands has become an important direction in FL. Most existing FL methods are based on the homogeneity assumption, namely, different clients have the same architectural models with the same tasks, which are unable to handle complex and multivariate data and tasks. To flexibly address these heterogeneity limitations, we propose a novel federated multi-task learning framework with the help of tensor trace norm, FedSAK. Specifically, it treats each client as a task and splits the local model into a feature extractor and a prediction head. Clients can flexibly choose shared structures based on heterogeneous situations and upload them to the server, which learns correlations among client models by mining model low-rank structures through tensor trace norm. Furthermore, we derive convergence and generalization bounds under non-convex settings. Evaluated on 6 real-world datasets compared to 13 advanced FL models, FedSAK demonstrates superior performance.

AAMAS Conference 2024 Conference Paper

Cooperation and Coordination in Heterogeneous Populations with Interaction Diversity

  • Hao Guo
  • Zhen Wang
  • Junliang Xing
  • Pin Tao
  • Yuanchun Shi

Cooperation, a prosocial behavior enhancing collective rewards in multi-agent games, intricately intertwines with coordination. This study explores how interaction diversity and zero-sum gifting influence cooperation and coordination in heterogeneous populations, where agents engage in threshold public goods games with multiple equilibria. Our model accommodates two sources of inequality: variations in agents’ capabilities to provide public goods and differences in the rewards they receive upon successful public good provision. In the absence of gifting, we demonstrate the inevitability of intermediate interaction intensity in fostering global cooperation, elucidating conditions for co-dominance, coexistence, and the polarized state of cooperation. While gifting introduces reciprocity opportunities, our findings highlight the importance of maintaining moderate levels of gifting, as excessive gifting can paradoxically undermine global cooperation. This research contributes valuable insights into the emergence of cooperation and coordination dynamics.

NeurIPS Conference 2024 Conference Paper

Customized Subgraph Selection and Encoding for Drug-drug Interaction Prediction

  • Haotong Du
  • Quanming Yao
  • Juzheng Zhang
  • Yang Liu
  • Zhen Wang

Subgraph-based methods have proven to be effective and interpretable in predicting drug-drug interactions (DDIs), which are essential for medical practice and drug development. Subgraph selection and encoding are critical stages in these methods, yet customizing these components remains underexplored due to the high cost of manual adjustments. In this study, inspired by the success of neural architecture search (NAS), we propose a method to search for data-specific components within subgraph-based frameworks. Specifically, we introduce extensive subgraph selection and encoding spaces that account for the diverse contexts of drug interactions in DDI prediction. To address the challenge of large search spaces and high sampling costs, we design a relaxation mechanism that uses an approximation strategy to efficiently explore optimal subgraph configurations. This approach allows for robust exploration of the search space. Extensive experiments demonstrate the effectiveness and superiority of the proposed method, with the discovered subgraphs and encoding functions highlighting the model’s adaptability.

AAAI Conference 2024 Conference Paper

DAG-Aware Variational Autoencoder for Social Propagation Graph Generation

  • Dongpeng Hou
  • Chao Gao
  • Xuelong Li
  • Zhen Wang

Propagation models in social networks are critical, with extensive applications across various fields and downstream tasks. However, existing propagation models are often oversimplified, scenario-specific, and lack real-world user social attributes. These limitations detaching from real-world analysis lead to inaccurate representations of the propagation process in social networks. To address these issues, we propose a User Features Attention-based DAG-Aware Variational Autoencoder (DAVA) for propagation graph generation. First, nearly 1 million pieces of user attributes data are collected. Then DAVA can integrate the analysis of propagation graph topology and corresponding user attributes as prior knowledge. By leveraging a lightweight attention-based framework and a sliding window mechanism based on BFS permutations weighted by user influence, DAVA significantly enhances the ability to generate realistic, large-scale propagation data, yielding graph scales ten times greater than those produced by existing SOTA methods. Every module of DAVA has flexibility and extension that allows for easy substitution to suit other generation tasks. Additionally, we provide a comprehensive evaluation of DAVA, one focus is the effectiveness of generated data in improving the performance of downstream tasks. During the generation process, we discover the Credibility Erosion Effect by modifying the generation rules, revealing a social phenomenon in social network propagation.

IJCAI Conference 2024 Conference Paper

Emergence of Social Norms in Generative Agent Societies: Principles and Architecture

  • Siyue Ren
  • Zhiyao Cui
  • Ruiqi Song
  • Zhen Wang
  • Shuyue Hu

Social norms play a crucial role in guiding agents towards understanding and adhering to standards of behavior, thus reducing social conflicts within multi-agent systems (MASs). However, current LLM-based (or generative) MASs lack the capability to be normative. In this paper, we propose a novel architecture, named CRSEC, to empower the emergence of social norms within generative MASs. Our architecture consists of four modules: Creation & Representation, Spreading, Evaluation, and Compliance. This addresses several important aspects of the emergent processes all in one: (i) where social norms come from, (ii) how they are formally represented, (iii) how they spread through agents' communications and observations, (iv) how they are examined with a sanity check and synthesized in the long term, and (v) how they are incorporated into agents' planning and actions. Our experiments deployed in the Smallville sandbox game environment demonstrate the capability of our architecture to establish social norms and reduce social conflicts within generative MASs. The positive outcomes of our human evaluation, conducted with 30 evaluators, further affirm the effectiveness of our approach. Our project can be accessed via the following link: https: //github. com/sxswz213/CRSEC.

NeurIPS Conference 2024 Conference Paper

Exploitation of a Latent Mechanism in Graph Contrastive Learning: Representation Scattering

  • Dongxiao He
  • Lianze Shan
  • Jitao Zhao
  • Hengrui Zhang
  • Zhen Wang
  • Weixiong Zhang

Graph Contrastive Learning (GCL) has emerged as a powerful approach for generating graph representations without the need for manual annotation. Most advanced GCL methods fall into three main frameworks: node discrimination, group discrimination, and bootstrapping schemes, all of which achieve comparable performance. However, the underlying mechanisms and factors that contribute to their effectiveness are not yet fully understood. In this paper, we revisit these frameworks and reveal a common mechanism—representation scattering—that significantly enhances their performance. Our discovery highlights an essential feature of GCL and unifies these seemingly disparate methods under the concept of representation scattering. To leverage this insight, we introduce Scattering Graph Representation Learning (SGRL), a novel framework that incorporates a new representation scattering mechanism designed to enhance representation diversity through a center-away strategy. Additionally, consider the interconnected nature of graphs, we develop a topology-based constraint mechanism that integrates graph structural properties with representation scattering to prevent excessive scattering. We extensively evaluate SGRL across various downstream tasks on benchmark datasets, demonstrating its efficacy and superiority over existing GCL methods. Our findings underscore the significance of representation scattering in GCL and provide a structured framework for harnessing this mechanism to advance graph representation learning. The code of SGRL is at https: //github. com/hedongxiao-tju/SGRL.

NeurIPS Conference 2024 Conference Paper

Exploring Molecular Pretraining Model at Scale

  • Xiaohong Ji
  • Zhen Wang
  • Zhifeng Gao
  • Hang Zheng
  • Linfeng Zhang
  • Guolin Ke
  • Weinan E

In recent years, pretraining models have made significant advancements in the fields of natural language processing (NLP), computer vision (CV), and life sciences. The significant advancements in NLP and CV are predominantly driven by the expansion of model parameters and data size, a phenomenon now recognized as the scaling laws. However, research exploring scaling law in molecular pretraining model remains unexplored. In this work, we present an innovative molecular pretraining model that leverages a two-track transformer to effectively integrate features at the atomic level, graph level, and geometry structure level. Along with this, we systematically investigate the scaling law within molecular pretraining models, examining the power-law correlations between validation loss and model size, dataset size, and computational resources. Consequently, we successfully scale the model to 1. 1 billion parameters through pretraining on 800 million conformations, making it the largest molecular pretraining model to date. Extensive experiments show the consistent improvement on the downstream tasks as the model size grows up. The model with 1. 1 billion parameters also outperform over existing methods, achieving an average 27\% improvement on the QM9 and 14\% on COMPAS-1D dataset.

AAAI Conference 2024 Conference Paper

GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking

  • Shu Yin
  • Peican Zhu
  • Lianwei Wu
  • Chao Gao
  • Zhen Wang

With the rise of social media, the spread of fake news has become a significant concern, potentially misleading public perceptions and impacting social stability. Although deep learning methods like CNNs, RNNs, and Transformer-based models like BERT have enhanced fake news detection. However, they primarily focus on content and do not consider social context during news propagation. Graph-based techniques have incorporated the social context but are limited by the need for large labeled datasets. To address these challenges, this paper introduces GAMC, an unsupervised fake news detection technique using the Graph Autoencoder with Masking and Contrastive learning. By leveraging both the context and content of news propagation as self-supervised signals, our method reduces the dependency on labeled datasets. Specifically, GAMC begins by applying data augmentation to the original news propagation graphs. Subsequently, these augmented graphs are encoded using a graph encoder and subsequently reconstructed via a graph decoder. Finally, a composite loss function that encompasses both reconstruction error and contrastive loss is designed. Firstly, it ensures the model can effectively capture the latent features, based on minimizing the discrepancy between reconstructed and original graph representations. Secondly, it aligns the representations of augmented graphs that originate from the same source. Experiments on the real-world dataset validate the effectiveness of our method.

AAAI Conference 2024 Conference Paper

GIN-SD: Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion

  • Le Cheng
  • Peican Zhu
  • Keke Tang
  • Chao Gao
  • Zhen Wang

Source detection in graphs has demonstrated robust efficacy in the domain of rumor source identification. Although recent solutions have enhanced performance by leveraging deep neural networks, they often require complete user data. In this paper, we address a more challenging task, rumor source detection with incomplete user data, and propose a novel framework, i.e., Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion (GIN-SD), to tackle this challenge. Specifically, our approach utilizes a positional embedding module to distinguish nodes that are incomplete and employs a self-attention mechanism to focus on nodes with greater information transmission capacity. To mitigate the prediction bias caused by the significant disparity between the numbers of source and non-source nodes, we also introduce a class-balancing mechanism. Extensive experiments validate the effectiveness of GIN-SD and its superiority to state-of-the-art methods.

IJCAI Conference 2024 Conference Paper

Joint Source Localization in Different Platforms via Implicit Propagation Characteristics of Similar Topics

  • Zhen Wang
  • Dongpeng Hou
  • Shu Yin
  • Chao Gao
  • Xianghua Li

Different social media are widely used in our daily lives. Inspired by the fact that similar topics have similar propagation characteristics, we mine the implicit knowledge of cascades with similar topics from different platforms to enhance the localization performance for scenarios where limited propagation data leads to the weak learning ability of existing localization models. In this work, we first construct a multiple platform propagation cascade dataset, aligning similar topics from both Twitter and Weibo, and enriching it with user profiles. Leveraging this dataset, we propose a Dual-channel Source Localization Framework (DSLF) for the joint cascades with similar topics. Specifically, a self-loop attention based graph convolutional network is designed to adaptively adjust the neighborhood aggregation scheme of different users with heterogeneous features in the message-passing process. Additionally, a dual-structure based Kullback-Leibler (KL) regularization module is proposed to constrain the latent distribution space of the source probabilities of similar characteristic-level users for a similar topic, enhancing the robustness of the model. Extensive experiments across Twitter and Weibo platforms demonstrate the superiority of the proposed DSLF over the SOTA methods. The code is available at https: //github. com/cgao-comp/DSLF.

NeurIPS Conference 2024 Conference Paper

Learning Superconductivity from Ordered and Disordered Material Structures

  • Pin Chen
  • Luoxuan Peng
  • Rui Jiao
  • Qing Mo
  • Zhen Wang
  • Wenbing Huang
  • Yang Liu
  • Yutong Lu

Superconductivity is a fascinating phenomenon observed in certain materials under certain conditions. However, some critical aspects of it, such as the relationship between superconductivity and materials' chemical/structural features, still need to be understood. Recent successes of data-driven approaches in material science strongly inspire researchers to study this relationship with them, but a corresponding dataset is still lacking. Hence, we present a new dataset for data-driven approaches, namely SuperCon3D, containing both 3D crystal structures and experimental superconducting transition temperature (Tc) for the first time. Based on SuperCon3D, we propose two deep learning methods for designing high Tc superconductors. The first is SODNet, a novel equivariant graph attention model for screening known structures, which differs from existing models in incorporating both ordered and disordered geometric content. The second is a diffusion generative model DiffCSP-SC for creating new structures, which enables high Tc-targeted generation. Extensive experiments demonstrate that both our proposed dataset and models are advantageous for designing new high Tc superconducting candidates.

AAAI Conference 2024 Conference Paper

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

  • Jinyi Liu
  • Zhi Wang
  • Yan Zheng
  • Jianye Hao
  • Chenjia Bai
  • Junjie Ye
  • Zhen Wang
  • Haiyin Piao

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream principle for directing exploration towards less explored areas, characterized by higher uncertainty. However, in the presence of environmental stochasticity (noise), purely optimistic exploration may lead to excessive probing of high-noise areas, consequently impeding exploration efficiency. Hence, in exploring noisy environments, while optimism-driven exploration serves as a foundation, prudent attention to alleviating unnecessary over-exploration in high-noise areas becomes beneficial. In this work, we propose Optimistic Value Distribution Explorer (OVD-Explorer) to achieve a noise-aware optimistic exploration for continuous control. OVD-Explorer proposes a new measurement of the policy's exploration ability considering noise in optimistic perspectives, and leverages gradient ascent to drive exploration. Practically, OVD-Explorer can be easily integrated with continuous control RL algorithms. Extensive evaluations on the MuJoCo and GridChaos tasks demonstrate the superiority of OVD-Explorer in achieving noise-aware optimistic exploration.

NeurIPS Conference 2024 Conference Paper

S-MolSearch: 3D Semi-supervised Contrastive Learning for Bioactive Molecule Search

  • Gengmo Zhou
  • Zhen Wang
  • Feng Yu
  • Guolin Ke
  • Zhewei Wei
  • Zhifeng Gao

Virtual Screening is an essential technique in the early phases of drug discovery, aimed at identifying promising drug candidates from vast molecular libraries. Recently, ligand-based virtual screening has garnered significant attention due to its efficacy in conducting extensive database screenings without relying on specific protein-binding site information. Obtaining binding affinity data for complexes is highly expensive, resulting in a limited amount of available data that covers a relatively small chemical space. Moreover, these datasets contain a significant amount of inconsistent noise. It is challenging to identify an inductive bias that consistently maintains the integrity of molecular activity during data augmentation. To tackle these challenges, we propose S-MolSearch, the first framework to our knowledge, that leverages molecular 3D information and affinity information in semi-supervised contrastive learning for ligand-based virtual screening. % S-MolSearch processes both labeled and unlabeled data, trains molecular structural encoders, and generates soft labels for unlabeled data, drawing on the principles of inverse optimal transport. Drawing on the principles of inverse optimal transport, S-MolSearch efficiently processes both labeled and unlabeled data, training molecular structural encoders while generating soft labels for the unlabeled data. This design allows S-MolSearch to adaptively utilize unlabeled data within the learning process. Empirically, S-MolSearch demonstrates superior performance on widely-used benchmarks LIT-PCBA and DUD-E. It surpasses both structure-based and ligand-based virtual screening methods for AUROC, BEDROC and EF.

JAIR Journal 2024 Journal Article

Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness

  • Xiaoyu Wen
  • Xudong Yu
  • Rui Yang
  • Haoyuan Chen
  • Chenjia Bai
  • Zhen Wang

To obtain a near-optimal policy with fewer interactions in Reinforcement Learning (RL), a promising approach involves the combination of offline RL, which enhances sample efficiency by leveraging offline datasets, and online RL, which explores informative transitions by interacting with the environment. Offline-to-Online RL provides a paradigm for improving an offline-trained agent within limited online interactions. However, due to the significant distribution shift between online experiences and offline data, most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in offline-to-online adaptation. To address this problem, we propose the Robust Offlineto-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness, and to mitigate the performance drop in online adaptation. Specifically, RO2O incorporates Q-ensemble for uncertainty penalty and adversarial samples for policy and value smoothness, which enable RO2O to maintain a consistent learning procedure in online adaptation without requiring special changes to the learning objective. Theoretical analyses in linear MDPs demonstrate that the uncertainty and smoothness lead to tighter optimality bound in offline-to-online against distribution shift. Experimental results illustrate the superiority of RO2O in facilitating stable offline-to-online learning and achieving significant improvement with limited online interactions.

NeurIPS Conference 2024 Conference Paper

Unified Graph Augmentations for Generalized Contrastive Learning on Graphs

  • Jiaming Zhuo
  • Yintong Lu
  • Hui Ning
  • Kun Fu
  • Bingxin Niu
  • Dongxiao He
  • Chuan Wang
  • Yuanfang Guo

In real-world scenarios, networks (graphs) and their tasks possess unique characteristics, requiring the development of a versatile graph augmentation (GA) to meet the varied demands of network analysis. Unfortunately, most Graph Contrastive Learning (GCL) frameworks are hampered by the specificity, complexity, and incompleteness of their GA techniques. Firstly, GAs designed for specific scenarios may compromise the universality of models if mishandled. Secondly, the process of identifying and generating optimal augmentations generally involves substantial computational overhead. Thirdly, the effectiveness of the GCL, even the learnable ones, is constrained by the finite selection of GAs available. To overcome the above limitations, this paper introduces a novel unified GA module dubbed UGA after reinterpreting the mechanism of GAs in GCLs from a message-passing perspective. Theoretically, this module is capable of unifying any explicit GAs, including node, edge, attribute, and subgraph augmentations. Based on the proposed UGA, a novel generalized GCL framework dubbed Graph cOntrastive UnifieD Augmentations (GOUDA) is proposed. It seamlessly integrates widely adopted contrastive losses and an introduced independence loss to fulfill the common requirements of consistency and diversity of augmentation across diverse scenarios. Evaluations across various datasets and tasks demonstrate the generality and efficiency of the proposed GOUDA over existing state-of-the-art GCLs.

AAAI Conference 2023 Conference Paper

A Pair-Approximation Method for Modelling the Dynamics of Multi-Agent Stochastic Games

  • Chen Chu
  • Zheng Yuan
  • Shuyue Hu
  • Chunjiang Mu
  • Zhen Wang

Developing a dynamical model for learning in games has attracted much recent interest. In stochastic games, agents need to make decisions in multiple states, and transitions between states, in turn, influence the dynamics of strategies. While previous works typically focus either on 2-agent stochastic games or on normal form games under an infinite-agent setting, we aim at formally modelling the learning dynamics in stochastic games under the infinite-agent setting. With a novel use of pair-approximation method, we develop a formal model for myopic Q-learning in stochastic games with symmetric state transition. We verify the descriptive power of our model (a partial differential equation) across various games through comparisons with agent-based simulation results. Based on our proposed model, we can gain qualitative and quantitative insights into the influence of transition probabilities on the dynamics of strategies. In particular, we illustrate that a careful design of transition probabilities can help players overcome the social dilemmas and promote cooperation, even if agents are myopic learners.

NeurIPS Conference 2023 Conference Paper

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

  • Kang Xu
  • Chenjia Bai
  • Xiaoteng Ma
  • Dong Wang
  • Bin Zhao
  • Zhen Wang
  • Xuelong Li
  • Wei Li

Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. For example, a robot learns the policy in a simulator, but when it is deployed in the real world, the dynamics of the environment may be different. Given the source and target domain with dynamics mismatch, we consider the online dynamics adaptation problem, in which case the agent can access sufficient source domain data while online interactions with the target domain are limited. Existing research has attempted to solve the problem from the dynamics discrepancy perspective. In this work, we reveal the limitations of these methods and explore the problem from the value difference perspective via a novel insight on the value consistency across domains. Specifically, we present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets across the two domains. Empirical results on various environments with kinematic and morphology shifts demonstrate that our method achieves superior performance compared to prior approaches.

AAAI Conference 2023 Conference Paper

Cross-Modality Person Re-identification with Memory-Based Contrastive Embedding

  • De Cheng
  • Xiaolong Wang
  • Nannan Wang
  • Zhen Wang
  • Xiaoyu Wang
  • Xinbo Gao

Visible-infrared person re-identification (VI-ReID) aims to retrieve the person images of the same identity from the RGB to infrared image space, which is very important for real-world surveillance system. In practice, VI-ReID is more challenging due to the heterogeneous modality discrepancy, which further aggravates the challenges of traditional single-modality person ReID problem, i.e., inter-class confusion and intra-class variations. In this paper, we propose an aggregated memory-based cross-modality deep metric learning framework, which benefits from the increasing number of learned modality-aware and modality-agnostic centroid proxies for cluster contrast and mutual information learning. Furthermore, to suppress the modality discrepancy, the proposed cross-modality alignment objective simultaneously utilizes both historical and up-to-date learned cluster proxies for enhanced cross-modality association. Such training mechanism helps to obtain hard positive references through increased diversity of learned cluster proxies, and finally achieves stronger ``pulling close'' effect between cross-modality image features. Extensive experiment results demonstrate the effectiveness of the proposed method, surpassing state-of-the-art works significantly by a large margin on the commonly used VI-ReID datasets.

NeurIPS Conference 2023 Conference Paper

DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation

  • Yiqun Duan
  • Jinzhao Zhou
  • Zhen Wang
  • Yu-Kai Wang
  • Chin-Teng Lin

The translation of brain dynamics into natural language is pivotal for brain-computer interfaces (BCIs), a field that has seen substantial growth in recent years. With the swift advancement of large language models, such as ChatGPT, the need to bridge the gap between the brain and languages becomes increasingly pressing. Current methods, however, require eye-tracking fixations or event markers to segment brain dynamics into word-level features, which can restrict the practical application of these systems. These event markers may not be readily available or could be challenging to acquire during real-time inference, and the sequence of eye fixations may not align with the order of spoken words. To tackle these issues, we introduce a novel framework, DeWave, that integrates discrete encoding sequences into open-vocabulary EEG-to-text translation tasks. DeWave uses a quantized variational encoder to derive discrete codex encoding and align it with pre-trained language models. This discrete codex representation brings forth two advantages: 1) it alleviates the order mismatch between eye fixations and spoken words by introducing text-EEG contrastive alignment training, and 2) it minimizes the interference caused by individual differences in EEG waves through an invariant discrete codex. Our model surpasses the previous baseline (40. 1 and 31. 7) by 3. 06% and 6. 34\%, respectively, achieving 41. 35 BLEU-1 and 33. 71 Rouge-F on the ZuCo Dataset. Furthermore, this work is the first to facilitate the translation of entire EEG signal periods without the need for word-level order markers (e. g. , eye fixations), scoring 20. 5 BLEU-1 and 29. 5 Rouge-1 on the ZuCo Dataset, respectively.

AAAI Conference 2023 Conference Paper

Emergence of Punishment in Social Dilemma with Environmental Feedback

  • Zhen Wang
  • Zhao Song
  • Chen Shen
  • Shuyue Hu

Altruistic punishment (or punishment) has been extensively shown as an important mechanism for promoting cooperation in human societies. In AI, the emergence of punishment has received much recent interest. In this paper, we contribute with a novel evolutionary game theoretic model to study the impacts of environmental feedback. Whereas a population of agents plays public goods games, there exists a third-party population whose payoffs depend not only on whether to punish or not, but also on the state of the environment (e.g., how cooperative the agents in a social dilemma are). Focusing on one-shot public goods games, we show that environmental feedback, by itself, can lead to the emergence of punishment. We analyze the co-evolution of punishment and cooperation, and derive conditions for their co-presence, co-dominance and co-extinction. Moreover, we show that the system can exhibit bistability as well as cyclic dynamics. Our findings provide a new explanation for the emergence of punishment. On the other hand, our results also alert the need for careful design of implementing punishment in multi-agent systems, as the resulting evolutionary dynamics can be somewhat complex.

AAAI Conference 2023 Conference Paper

Hard Sample Aware Network for Contrastive Deep Graph Clustering

  • Yue Liu
  • Xihong Yang
  • Sihang Zhou
  • Xinwang Liu
  • Zhen Wang
  • Ke Liang
  • Wenxuan Tu
  • Liang Li

Contrastive deep graph clustering, which aims to divide nodes into disjoint groups via contrastive mechanisms, is a challenging research spot. Among the recent works, hard sample mining-based algorithms have achieved great attention for their promising performance. However, we find that the existing hard sample mining methods have two problems as follows. 1) In the hardness measurement, the important structural information is overlooked for similarity calculation, degrading the representativeness of the selected hard negative samples. 2) Previous works merely focus on the hard negative sample pairs while neglecting the hard positive sample pairs. Nevertheless, samples within the same cluster but with low similarity should also be carefully learned. To solve the problems, we propose a novel contrastive deep graph clustering method dubbed Hard Sample Aware Network (HSAN) by introducing a comprehensive similarity measure criterion and a general dynamic sample weighing strategy. Concretely, in our algorithm, the similarities between samples are calculated by considering both the attribute embeddings and the structure embeddings, better revealing sample relationships and assisting hardness measurement. Moreover, under the guidance of the carefully collected high-confidence clustering information, our proposed weight modulating function will first recognize the positive and negative samples and then dynamically up-weight the hard sample pairs while down-weighting the easy ones. In this way, our method can mine not only the hard negative samples but also the hard positive sample, thus improving the discriminative capability of the samples further. Extensive experiments and analyses demonstrate the superiority and effectiveness of our proposed method. The source code of HSAN is shared at https://github.com/yueliu1999/HSAN and a collection (papers, codes and, datasets) of deep graph clustering is shared at https://github.com/yueliu1999/Awesome-Deep-Graph-Clustering on Github.

IS Journal 2023 Journal Article

Irregularly Sampled Multivariate Time Series Classification: A Graph Learning Approach

  • Zhen Wang
  • Ting Jiang
  • Zenghui Xu
  • Ji Zhang
  • Jianliang Gao

To date, graph-based learning methods are proven to be effective for modeling spatial and structural dependencies. However, when applied to IS-MTS, they encounter three major challenges due to the complex data characteristics of IS-MTS: 1) variable time intervals between observations; 2) asynchronous time points across dimensions; and 3) a lack of prior knowledge of connectivity structure for message propagation. To fill these gaps, we propose a multivariate temporal graph network to coherently capture structural interactions, learn temporal dependencies, and handle challenging characteristics of IS-MTS data. Specifically, we first build a multivariate interaction module to handle frequent missing values and extract the graph structure relation automatically. Second, we design a novel adjacent graph propagation mechanism to aggregate the neighbor information from multistep snapshots. Third, we construct a masked temporal-aware attention module to explicitly consider the timestamp context and interval irregularity. Based on an extensive experimental evaluation, we demonstrate the superior performance of the proposed method.

NeurIPS Conference 2023 Conference Paper

Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

  • Guozheng Ma
  • Linrui Zhang
  • Haoyu Wang
  • Lu Li
  • Zilin Wang
  • Zhen Wang
  • Li Shen
  • Xueqian Wang

Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra auxiliary representation tasks or pre-trained encoders. However, it remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. To investigate this issue and further explore the potential of DA, this work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy and provides the following insights and improvements: (1) For individual DA operations, we reveal that both ample spatial diversity and slight hardness are indispensable. Building on this finding, we introduce Random PadResize (Rand PR), a new DA operation that offers abundant spatial diversity with minimal hardness. (2) For multi-type DA fusion schemes, the increased DA hardness and unstable data distribution result in the current fusion schemes being unable to achieve higher sample efficiency than their corresponding individual operations. Taking the non-stationary nature of RL into account, we propose a RL-tailored multi-type DA fusion scheme called Cycling Augmentation (CycAug), which performs periodic cycles of different DA operations to increase type diversity while maintaining data distribution consistency. Extensive evaluations on the DeepMind Control suite and CARLA driving simulator demonstrate that our methods achieve superior sample efficiency compared with the prior state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Local-Global Defense against Unsupervised Adversarial Attacks on Graphs

  • Di Jin
  • Bingdao Feng
  • Siqi Guo
  • Xiaobao Wang
  • Jianguo Wei
  • Zhen Wang

Unsupervised pre-training algorithms for graph representation learning are vulnerable to adversarial attacks, such as first-order perturbations on graphs, which will have an impact on particular downstream applications. Designing an effective representation learning strategy against white-box attacks remains a crucial open topic. Prior research attempts to improve representation robustness by maximizing mutual information between the representation and the perturbed graph, which is sub-optimal because it does not adapt its defense techniques to the severity of the attack. To address this issue, we propose an unsupervised defense method that combines local and global defense to improve the robustness of representation. Note that we put forward the Perturbed Edges Harmfulness (PEH) metric to determine the riskiness of the attack. Thus, when the edges are attacked, the model can automatically identify the risk of attack. We present a method of attention-based protection against high-risk attacks that penalizes attention coefficients of perturbed edges to encoders. Extensive experiments demonstrate that our strategies can enhance the robustness of representation against various adversarial attacks on three benchmark graphs.

IJCAI Conference 2023 Conference Paper

MAT: Mixed-Strategy Game of Adversarial Training in Fine-tuning

  • Zhehua Zhong
  • Tianyi Chen
  • Zhen Wang

Fine-tuning large-scale pre-trained language models has been demonstrated effective for various natural language processing (NLP) tasks. Previous studies have established that incorporating adversarial training during the fine-tuning stage can significantly enhance model generalization and robustness. However, from the perspective of game theory, such utilizations of adversarial training correspond to pure-strategy games, which are inherently limited in terms of the scope of their strategies, thereby still having room for improvement. In order to push the performance boundaries, we propose a novel Mixed-strategy Adversarial Training algorithm (MAT). Methodologically, we derive the Nash equilibrium of a mixed-strategy game for adversarial training using Entropy Mirror Descent to establish MAT by sampling method. To verify the effectiveness of MAT, we conducted extensive benchmark experiments on large-scale pre-trained models, such as BERT and RoBERTa. MAT significantly outperforms the state-of-the-art methods on both the GLUE and ANLI benchmarks in terms of generalization and robustness.

JBHI Journal 2023 Journal Article

Meta-Probability Weighting for Improving Reliability of DNNs to Label Noise

  • Zhen Wang
  • Shuo Jin
  • Linhao Li
  • Yongfeng Dong
  • Qinghua Hu

Training noise-robust deep neural networks (DNNs) in label noise scenario is a crucial task. In this paper, we first demonstrates that the DNNs learning with label noise exhibits over-fitting issue on noisy labels because of the DNNs is too confidence in its learning capacity. More significantly, however, it also potentially suffers from under-learning on samples with clean labels. DNNs essentially should pay more attention on the clean samples rather than the noisy samples. Inspired by the sample-weighting strategy, we propose a meta-probability weighting (MPW) algorithm which re-weights the output probability of DNNs to prevent DNNs from over-fitting to label noise and alleviate the under-learning issue on the clean sample. MPW conducts an approximation optimization to adaptively learn the probability weights from data under the supervision of a small clean dataset, and achieves iterative optimization between probability weights and network parameters via meta-learning paradigm. The ablation studies substantiate the effectiveness of MPW to prevent the deep neural networks from overfitting to label noise and improve the learning capacity on clean samples. Furthermore, MPW achieves competitive performance with other state-of-the-art methods on both synthetic and real-world noises.

NeurIPS Conference 2023 Conference Paper

Self-supervised Graph Neural Networks via Low-Rank Decomposition

  • Liang Yang
  • Runjie Shi
  • Qiuliang Zhang
  • Bingxin Niu
  • Zhen Wang
  • Xiaochun Cao
  • Chuan Wang

Self-supervised learning is introduced to train graph neural networks (GNNs) by employing propagation-based GNNs designed for semi-supervised learning tasks. Unfortunately, this common choice tends to cause two serious issues. Firstly, global parameters cause the model lack the ability to capture the local property. Secondly, it is difficult to handle networks beyond homophily without label information. This paper tends to break through the common choice of employing propagation-based GNNs, which aggregate representations of nodes belonging to different classes and tend to lose discriminative information. If the propagation in each ego-network is just between the nodes from the same class, the obtained representation matrix should follow the low-rank characteristic. To meet this requirement, this paper proposes the Low-Rank Decomposition-based GNNs (LRD-GNN-Matrix) by employing Low-Rank Decomposition to the attribute matrix. Furthermore, to incorporate long-distance information, Low-Rank Tensor Decomposition-based GNN (LRD-GNN-Tensor) is proposed by constructing the node attribute tensor from selected similar ego-networks and performing Low-Rank Tensor Decomposition. The employed tensor nuclear norm facilitates the capture of the long-distance relationship between original and selected similar ego-networks. Extensive experiments demonstrate the superior performance and the robustness of LRD-GNNs.

IJCAI Conference 2023 Conference Paper

Sequential Attention Source Identification Based on Feature Representation

  • Dongpeng Hou
  • Zhen Wang
  • Chao Gao
  • Xuelong Li

Snapshot observation based source localization has been widely studied due to its accessibility and low cost. However, the interaction of users in existing methods does not be addressed in time-varying infection scenarios. So these methods have a decreased accuracy in heterogeneous interaction scenarios. To solve this critical issue, this paper proposes a sequence-to-sequence based localization framework called Temporal-sequence based Graph Attention Source Identification (TGASI) based on an inductive learning idea. More specifically, the encoder focuses on generating multiple features by estimating the influence probability between two users, and the decoder distinguishes the importance of prediction sources in different timestamps by a designed temporal attention mechanism. It's worth mentioning that the inductive learning idea ensures that TGASI can detect the sources in new scenarios without knowing other prior knowledge, which proves the scalability of TGASI. Comprehensive experiments with the SOTA methods demonstrate the higher detection performance and scalability in different scenarios of TGASI.

NeurIPS Conference 2023 Conference Paper

ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings

  • Shibo Hao
  • Tianyang Liu
  • Zhen Wang
  • Zhiting Hu

Integrating large language models (LLMs) with various tools has led to increased attention in the field. Existing approaches either involve fine-tuning the LLM, which is both computationally costly and limited to a fixed set of tools, or prompting LLMs by in-context tool demonstrations. Although the latter method offers adaptability to new tools, it struggles with the inherent context length constraint of LLMs when many new tools are presented, and mastering a new set of tools with few-shot examples remains challenging, resulting in suboptimal performance. To address these limitations, we propose a novel solution, named ToolkenGPT, wherein LLMs effectively learn to master tools as predicting tokens through tool embeddings for solving complex tasks. In this framework, each tool is transformed into vector embeddings and plugged into the language model head. Once the function is triggered during text generation, the LLM enters a special function mode to execute the tool calls. Our experiments show that function embeddings effectively help LLMs understand tool use and improve on several tasks, including numerical reasoning, knowledge-based question answering and embodied decision-making.

IJCAI Conference 2022 Conference Paper

A Formal Model for Multiagent Q-Learning Dynamics on Regular Graphs

  • Chen Chu
  • Yong Li
  • Jinzhuo Liu
  • Shuyue Hu
  • Xuelong Li
  • Zhen Wang

Modeling the dynamics of multi-agent learning has long been an important research topic. The focus of previous research has been either on 2-agent settings or well-mixed infinitely large agent populations. In this paper, we consider the scenario where n Q-learning agents locate on regular graphs, such that agents can only interact with their neighbors. We examine the local interactions between individuals and their neighbors, and derive a formal model to capture the Q-value dynamics of the entire population. Through comparisons with agent-based simulations on different types of regular graphs, we show that our model describes the agent learning dynamics in an exact manner.

AAAI Conference 2022 Conference Paper

Continual Learning through Retrieval and Imagination

  • Zhen Wang
  • Liu Liu
  • Yiqun Duan
  • Dacheng Tao

Continual learning is an intellectual ability of artificial agents to learn new streaming labels from sequential data. The main impediment to continual learning is catastrophic forgetting, a severe performance degradation on previously learned tasks. Although simply replaying all previous data or continuously adding the model parameters could alleviate the issue, it is impractical in real-world applications due to the limited available resources. Inspired by the mechanism of the human brain to deepen its past impression, we propose a novel framework, Deep Retrieval and Imagination (DRI), which consists of two components: 1) an embedding network that constructs a unified embedding space without adding model parameters on the arrival of new tasks; and 2) a generative model to produce additional (imaginary) data based on the limited memory. By retrieving the past experiences and corresponding imaginary data, DRI distills knowledge and rebalances the embedding space to further mitigate forgetting. Theoretical analysis demonstrates that DRI can reduce the loss approximation error and improve the robustness through retrieval and imagination, bringing better generalizability to the network. Extensive experiments show that DRI performs significantly better than the existing state-of-the-art continual learning methods and effectively alleviates catastrophic forgetting.

NeurIPS Conference 2022 Conference Paper

EvenNet: Ignoring Odd-Hop Neighbors Improves Robustness of Graph Neural Networks

  • Runlin Lei
  • Zhen Wang
  • Yaliang Li
  • Bolin Ding
  • Zhewei Wei

Graph Neural Networks (GNNs) have received extensive research attention for their promising performance in graph machine learning. Despite their extraordinary predictive accuracy, existing approaches, such as GCN and GPRGNN, are not robust in the face of homophily changes on test graphs, rendering these models vulnerable to graph structural attacks and with limited capacity in generalizing to graphs of varied homophily levels. Although many methods have been proposed to improve the robustness of GNN models, most of these techniques are restricted to the spatial domain and employ complicated defense mechanisms, such as learning new graph structures or calculating edge attentions. In this paper, we study the problem of designing simple and robust GNN models in the spectral domain. We propose EvenNet, a spectral GNN corresponding to an even-polynomial graph filter. Based on our theoretical analysis in both spatial and spectral domains, we demonstrate that EvenNet outperforms full-order models in generalizing across homophilic and heterophilic graphs, implying that ignoring odd-hop neighbors improves the robustness of GNNs. We conduct experiments on both synthetic and real-world datasets to demonstrate the effectiveness of EvenNet. Notably, EvenNet outperforms existing defense models against structural attacks without introducing additional computational costs and maintains competitiveness in traditional node classification tasks on homophilic and heterophilic graphs.

IJCAI Conference 2022 Conference Paper

Modelling the Dynamics of Regret Minimization in Large Agent Populations: a Master Equation Approach

  • Zhen Wang
  • Chunjiang Mu
  • Shuyue Hu
  • Chen Chu
  • Xuelong Li

Understanding the learning dynamics in multiagent systems is an important and challenging task. Past research on multi-agent learning mostly focuses on two-agent settings. In this paper, we consider the scenario in which a population of infinitely many agents apply regret minimization in repeated symmetric games. We propose a new formal model based on the master equation approach in statistical physics to describe the evolutionary dynamics in the agent population. Our model takes the form of a partial differential equation, which describes how the probability distribution of regret evolves over time. Through experiments, we show that our theoretical results are consistent with the agent-based simulation results.

NeurIPS Conference 2022 Conference Paper

OPEN: Orthogonal Propagation with Ego-Network Modeling

  • Liang Yang
  • Lina Kang
  • Qiuliang Zhang
  • Mengzhe Li
  • Bingxin Niu
  • Dongxiao He
  • Zhen Wang
  • Chuan Wang

To alleviate the unfavorable effect of noisy topology in Graph Neural networks (GNNs), some efforts perform the local topology refinement through the pairwise propagation weight learning and the multi-channel extension. Unfortunately, most of them suffer a common and fatal drawback: irrelevant propagation to one node and in multi-channels. These two kinds of irrelevances make propagation weights in multi-channels free to be determined by the labeled data, and thus the GNNs are exposed to overfitting. To tackle this issue, a novel Orthogonal Propagation with Ego-Network modeling (OPEN) is proposed by modeling relevances between propagations. Specifically, the relevance between propagations to one node is modeled by whole ego-network modeling, while the relevance between propagations in multi-channels is modeled via diversity requirement. By interpreting the propagations to one node from the perspective of dimension reduction, propagation weights are inferred from principal components of the ego-network, which are orthogonal to each other. Theoretical analysis and experimental evaluations reveal four attractive characteristics of OPEN as modeling high-order relationships beyond pairwise one, preventing overfitting, robustness, and high efficiency.

IJCAI Conference 2022 Conference Paper

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

  • Tong Sang
  • Hongyao Tang
  • Yi Ma
  • Jianye Hao
  • Yan Zheng
  • Zhaopeng Meng
  • Boyan Li
  • Zhen Wang

Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) network is trained to approximate the values for different combinations of policies and environments from offline experiences. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.

IROS Conference 2021 Conference Paper

"Safe Skin" - A Low-Cost Capacitive Proximity-Force-Fusion Sensor for Safety in Robots

  • Zhen Wang
  • Heyang Gao
  • Alexander Schmitz
  • Sophon Somlor
  • Tito Pradhono Tomo
  • Shigeki Sugano

This paper presents the design and evaluation of the low-cost capacitive proximity-force-fusion sensor "safe skin", which can measure simultaneously the proximity of humans as well as the contact force. It was designed such that the force and proximity sensing functions can work concurrently without interfering with each other. Moreover, active shielding, on-chip digitization and ground isolation are implemented for the sensor to minimize the influence from stray capacitance and electromagnetic interference (EMI) from the environment, which ensures that the sensor has a high system robustness for industrial applications. The prototype version has the capability of detecting a grounded human hand sized object from a distance of 400 mm. Moreover, forces in the range of 5 to 40 N can be measured, with 43. 7% hysteresis and 6. 7% nonlinearity. Due to its sensing characteristics, when used on a robot, the sensor could be used to ensure the safety of nearby humans in the future. The sensor could also potentially be used as an interface for human-robot interaction (HRI).

AAAI Conference 2021 Conference Paper

EvaLDA: Efficient Evasion Attacks Towards Latent Dirichlet Allocation

  • Qi Zhou
  • Haipeng Chen
  • Yitao Zheng
  • Zhen Wang

As one of the most powerful topic models, Latent Dirichlet Allocation (LDA) has been used in a vast range of tasks, including document understanding, information retrieval and peer-reviewer assignment. Despite its tremendous popularity, the security of LDA has rarely been studied. This poses severe risks to security-critical tasks such as sentiment analysis and peer-reviewer assignment that are based on LDA. In this paper, we are interested in knowing whether LDA models are vulnerable to adversarial perturbations of benign document examples during inference time. We formalize the evasion attack to LDA models as an optimization problem and prove it to be NP-hard. We then propose a novel and efficient algorithm, EvaLDA to solve it. We show the effectiveness of EvaLDA via extensive empirical evaluations. For instance, in the NIPS dataset, EvaLDA can averagely promote the rank of a target topic from 10 to around 7 by only replacing 1% of the words with similar words in a victim document. Our work provides significant insights into the power and limitations of evasion attacks to LDA models.

AAAI Conference 2021 Short Paper

Multi-label Few-shot Learning with Semantic Inference (Student Abstract)

  • Zhen Wang
  • Yiqun Duan
  • Liu Liu
  • Dacheng Tao

Few-shot learning can adapt the classification model to new labels with only a few labeled examples. Previous studies mainly focused on the scenario of a single category label per example but have not effectively solved the more challenging multi-label scenario, which has exponential-sized output space and low-data. In this paper, we propose a semanticaware meta-learning model for multi-label few-shot learning. Our approach can learn and infer the semantic correlation between unseen labels and historical labels to quickly adapt multi-label tasks based on only a few examples. Specifically, features can be mapped into the semantic space via label embeddings to exploit the label correlation, thus structuring the overwhelming output space. We design a novel semantic inference mechanism for leveraging prior knowledge learned from historical labels, which will produce good generalization performance on new labels to alleviate the overfitting caused by low-data. Finally, empirical results show that the proposed method significantly outperforms the existing stateof-the-art methods on the multi-label few-shot learning tasks.

IJCAI Conference 2020 Conference Paper

A Dual Input-aware Factorization Machine for CTR Prediction

  • Wantong Lu
  • Yantao Yu
  • Yongzhe Chang
  • Zhen Wang
  • Chenhui Li
  • Bo Yuan

Factorization Machines (FMs) refer to a class of general predictors working with real valued feature vectors, which are well-known for their ability to estimate model parameters under significant sparsity and have found successful applications in many areas such as the click-through rate (CTR) prediction. However, standard FMs only produce a single fixed representation for each feature across different input instances, which may limit the CTR model’s expressive and predictive power. Inspired by the success of Input-aware Factorization Machines (IFMs), which aim to learn more flexible and informative representations of a given feature according to different input instances, we propose a novel model named Dual Input-aware Factorization Machines (DIFMs) that can adaptively reweight the original feature representations at the bit-wise and vector-wise levels simultaneously. Furthermore, DIFMs strategically integrate various components including Multi-Head Self-Attention, Residual Networks and DNNs into a unified end-to-end model. Comprehensive experiments on two real-world CTR prediction datasets show that the DIFM model can outperform several state-of-the-art models consistently.

IJCAI Conference 2020 Conference Paper

AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search

  • Daoyuan Chen
  • Yaliang Li
  • Minghui Qiu
  • Zhen Wang
  • Bofang Li
  • Bolin Ding
  • Hongbo Deng
  • Jun Huang

Large pre-trained language models such as BERT have shown their effectiveness in various natural language processing tasks. However, the huge parameter size makes them difficult to be deployed in real-time applications that require quick inference with limited resources. Existing methods compress BERT into small models while such compression is task-independent, i. e. , the same compressed BERT for all different downstream tasks. Motivated by the necessity and benefits of task-oriented BERT compression, we propose a novel compression method, AdaBERT, that leverages differentiable Neural Architecture Search to automatically compress BERT into task-adaptive small models for specific tasks. We incorporate a task-oriented knowledge distillation loss to provide search hints and an efficiency-aware loss as search constraints, which enables a good trade-off between efficiency and effectiveness for task-adaptive BERT compression. We evaluate AdaBERT on several NLP tasks, and the results demonstrate that those task-adaptive compressed models are 12. 7x to 29. 3x faster than BERT in inference time and 11. 5x to 17. 0x smaller in terms of parameter size, while comparable performance is maintained.

IJCAI Conference 2020 Conference Paper

Towards a Hierarchical Bayesian Model of Multi-View Anomaly Detection

  • Zhen Wang
  • Chao Lan

Traditional anomaly detectors examine a single view of instances and cannot discover multi-view anomalies, i. e. , instances that exhibit inconsistent behaviors across different views. To tackle the problem, several multi-view anomaly detectors have been developed recently, but they are all transductive and unsupervised thus may suffer some challenges. In this paper, we propose a novel inductive semi-supervised Bayesian multi-view anomaly detector. Specifically, we first present a generative model for normal data. Then, we build a hierarchical Bayesian model, by first assigning priors to all parameters and latent variables, and then assigning priors over the priors. Finally, we employ variational inference to approximate the posterior of the model and evaluate anomalous scores of multi-view instances. In the experiment, we show the proposed Bayesian detector consistently outperforms state-of-the-art counterparts across several public data sets and three well-known types of multi-view anomalies. In theory, we prove the inferred Bayesian estimator is consistent and derive a proximate sample complexity for the proposed anomaly detector.

IJCAI Conference 2019 Conference Paper

An Input-aware Factorization Machine for Sparse Prediction

  • Yantao Yu
  • Zhen Wang
  • Bo Yuan

Factorization machines (FMs) are a class of general predictors working effectively with sparse data, which represents features using factorized parameters and weights. However, the accuracy of FMs can be adversely affected by the fixed representation trained for each feature, as the same feature is usually not equally predictive and useful in different instances. In fact, the inaccurate representation of features may even introduce noise and degrade the overall performance. In this work, we improve FMs by explicitly considering the impact of individual input upon the representation of features. We propose a novel model named \textit{Input-aware Factorization Machine} (IFM), which learns a unique input-aware factor for the same feature in different instances via a neural network. Comprehensive experiments on three real-world recommendation datasets are used to demonstrate the effectiveness and mechanism of IFM. Empirical results indicate that IFM is significantly better than the standard FM model and consistently outperforms four state-of-the-art deep learning based methods.

AAMAS Conference 2019 Conference Paper

Fraud Regulating Policy for E-Commerce via Constrained Contextual Bandits

  • Zehong Hu
  • Zhen Wang
  • Zhao Li
  • Shichang Hu
  • Shasha Ruan
  • Jie Zhang

Fraud sellers in e-commerce often promote themselves via fake visits or purchases to increase sales, jeopardizing the business environment of the platform. How to regulate the exposure of these sellers to buyers without affecting normal online business remains a challenging problem, since blocking them entirely without discrimination may kill the normal transactions and could potentially decrease the total transactions of the platform. To address this problem, we introduce a regulating valve which blocks fraud sellers with a certain probability. To learn the optimal blocking policy, we model the regulating valve as a contextual bandit problem with a constraint on the total transaction decline. Since existing bandit algorithms are unable to incorporate the transaction constraint, we propose a novel bandit algorithm, which decides the policy based on a set of neural networks and iteratively updates the neural networks with online observations and the constraint. Experiments on synthetic data and one of the largest e-commerce platforms in the world both show that our algorithm effectively and efficiently outperforms existing bandit algorithms by a large margin.

IJCAI Conference 2017 Conference Paper

Optimal Escape Interdiction on Transportation Networks

  • Youzhi Zhang
  • Bo An
  • Long Tran-Thanh
  • Zhen Wang
  • Jiarui Gan
  • Nicholas R. Jennings

Preventing crimes or terrorist attacks in urban areas is challenging. Law enforcement officers need to respond quickly to catch the attacker on his escape route, which is subject to time-dependent traffic conditions on transportation networks. The attacker can strategically choose his escape path and driving speed to avoid being captured. Existing work on security resource allocation has not considered such scenarios with time-dependent strategies for both players. Therefore, in this paper, we study the problem of efficiently scheduling security resources for interdicting the escaping attacker. We propose: 1) a new defender-attacker security game model for escape interdiction on transportation networks; and 2) an efficient double oracle algorithm to compute the optimal defender strategy, which combines mixed-integer linear programming formulations for best response problems and effective approximation algorithms for improving the scalability of the algorithms. Experimental evaluation shows that our approach significantly outperforms baselines in solution quality and scales up to realistic-sized transportation networks with hundreds of intersections.

AAAI Conference 2016 Conference Paper

Computing Optimal Monitoring Strategy for Detecting Terrorist Plots

  • Zhen Wang
  • Yue Yin
  • Bo An

In recent years, terrorist organizations (e. g. , ISIS or al-Qaeda) are increasingly directing terrorists to launch coordinated attacks in their home countries. One example is the Paris shootings on January 7, 2015. By monitoring potential terrorists, security agencies are able to detect and stop terrorist plots at their planning stage. Although security agencies may have knowledge about potential terrorists (e. g. , who they are, how they interact), they usually have limited resources and cannot monitor all terrorists. Moreover, a terrorist planner may strategically choose to arouse terrorists considering the security agency’s monitoring strategy. This paper makes five key contributions toward the challenging problem of computing optimal monitoring strategies: 1) A new Stackelberg game model for terrorist plot detection; 2) A modified double oracle framework for computing the optimal strategy effectively; 3) Complexity results for both defender and attacker oracle problems; 4) Novel mixed-integer linear programming (MILP) formulations for best response problems of both players; and 5) Effective approximation algorithms for generating suboptimal responses for both players. Experimental evaluation shows that our approach can obtain a robust enough solution outperforming widely-used centrality based heuristics significantly and scale up to realistic-sized problems.

AAAI Conference 2014 Conference Paper

Knowledge Graph Embedding by Translating on Hyperplanes

  • Zhen Wang
  • Jianwen Zhang
  • Jianlin Feng
  • Zheng Chen

We deal with embedding a large scale knowledge graph composed of entities and relations into a continuous vector space. TransE is a promising method proposed recently, which is very efficient while achieving state-of-the-art predictive performance. We discuss some mapping properties of relations which should be considered in embedding, such as reflexive, one-to-many, many-to-one, and many-to-many. We note that TransE does not do well in dealing with these properties. Some complex models are capable of preserving these mapping properties but sacrifice efficiency in the process. To make a good trade-off between model capacity and efficiency, in this paper we propose TransH which models a relation as a hyperplane together with a translation operation on it. In this way, we can well preserve the above mapping properties of relations with almost the same model complexity of TransE. Additionally, as a practical knowledge graph is often far from completed, how to construct negative examples to reduce false negative labels in training is very important. Utilizing the one-to-many/many-to-one mapping property of a relation, we propose a simple trick to reduce the possibility of false negative labeling. We conduct extensive experiments on link prediction, triplet classification and fact extraction on benchmark datasets like WordNet and Freebase. Experiments show TransH delivers significant improvements over TransE on predictive accuracy with comparable capability to scale up.

ICRA Conference 2011 Conference Paper

Analysis and control of a biped line-walking robot for inspection of power transmission lines

  • Ludan Wang
  • Fei Liu 0029
  • Zhen Wang
  • Shaoqiang Xu
  • Jianwei Zhang 0001
  • Sheng Cheng

The subject of this paper is the analysis and control of a biped line walking robot for inspection of power transmission lines. The novel mechanism enables the center of the robot to concentrate on the hip joint to minimize the drive torque of the hip joint The mechanical structure of the robot is discussed, as well as forward kinematics. A dynamic model is established in this paper. Locomotion of the line-walking robot is discussed in detail and the line-walking cycle of the designed mechanism is composed of a single-support phase and a double-support phase. Finally, a PD+ gravity compensation method is introduced for the controller design in the single-support phase and a complaint controller with estimated force is implemented during the double-support phase. The feasibility of this concept is then confirmed by performing experiments with a simulated line environment.

IROS Conference 2010 Conference Paper

Development of a practical power transmission line inspection robot based on a novel line walking mechanism

  • Ludan Wang
  • Fei Liu 0029
  • Zhen Wang
  • Shaoqiang Xu
  • Sheng Cheng
  • Jianwei Zhang 0001

A mobile robot based on novel line-walking mechanism is proposed for inspecting power transmission lines. The novel mechanism enables the centroid of the robot to concentrate on the hip joint to minimize the drive torque of the hip joint and keep the robot stable when only one leg is hung on line. After reviewing of the line-walking mechanism, power line inspection robot is described in detail. The pose adjustment analysis is carried out to make sure that the robot will keep in stable state when rolling on the power transmission line which forms a catenary curve. The obstacle-navigation cycle of the designed inspection robot is composed of a single-support phase and a double-support phase. The centroid of the robot will be adjusted to shift to the other leg to start a new single-support phase. The feasibility of this concept is then confirmed by performing experiments with a simulated line environment.