Arrow Research search

Author name cluster

Lichao Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

38 papers
1 author row

Possible papers

38

AAAI Conference 2026 System Paper

3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation

  • Yunhong He
  • Zhengqing Yuan
  • Zhengzhong Tu
  • Yanfang Ye
  • Lichao Sun

We introduce DreamLand, an interactive 4D visualization framework that integrates WebGL with Supersplat rendering. It transforms static images and text into coherent 4D scenes through four core modules and employs a foveated rendering strategy for efficient, real-time multi-modal interaction. This framework enables adaptive, user-driven exploration of complex 4D environments.

TMLR Journal 2026 Journal Article

Beyond Affinity: A Benchmark of 1D, 2D, and 3D Methods Reveals Critical Trade-offs in Structure-Based Drug Design

  • Kangyu Zheng
  • Kai Zhang
  • Jiale Tan
  • Xuehan Chen
  • Yingzhou Lu
  • Zaixi Zhang
  • Lichao Sun
  • Marinka Zitnik

Currently, the field of structure-based drug design is dominated by three main types of algorithms: search-based algorithms, deep generative models, and reinforcement learning. While existing works have typically focused on comparing models within a single algorithmic category, cross-algorithm comparisons remain scarce. In this paper, to fill the gap, we establish a benchmark to evaluate the performance of fifteen models across these different algorithmic foundations by assessing the pharmaceutical properties of the generated molecules and their docking affinities and poses with specified target proteins. We highlight the unique advantages of each algorithmic approach and offer recommendations for the design of future SBDD models. We emphasize that 1D/2D ligand-centric drug design methods can be used in SBDD by treating the docking function as a black-box oracle, which is typically neglected. Our evaluation reveals distinct patterns across model categories. 3D structure-based models excel in binding affinities but show inconsistencies in chemical validity and pose quality. 1D models demonstrate reliable performance in standard molecular metrics but rarely achieve optimal binding affinities. 2D models offer balanced performance, maintaining high chemical validity while achieving moderate binding scores. Through detailed analysis across multiple protein targets, we identify key improvement areas for each model category, providing insights for researchers to combine strengths of different approaches while addressing their limitations. All the code that are used for benchmarking is available in https://github.com/zkysfls/2025-sbdd-benchmark

AAAI Conference 2026 Conference Paper

Vision-MoR: Scaling Vision Transformer via Patch-Level Mixture-of-Recursions

  • Yunhong He
  • Zhengqing Yuan
  • Weixiang Sun
  • YiYang Li
  • Yixin Liu
  • Yanfang Ye
  • Lichao Sun

Scaling Vision Transformers (ViTs) has yielded remarkable advancements in diverse vision tasks, albeit at the cost of escalating computational, memory, and parameter demands. Existing efficiency techniques typically address only one dimension, computation, memory, or parameters, lacking a cohesive approach. In this paper, we introduce Vision-MoR, a novel ViT architecture that unifies parameter sharing, spatially adaptive computation, and memory-efficient design into a single framework. Vision-MoR employs a spatial-aware router with shifted-window attention to dynamically assign per-patch recursion depths, coupled with a recursive Transformer loop enabling token-wise early exiting. This facilitates content-adaptive processing and recursive parameter reuse while preserving spatial locality. On ImageNet-1K, Vision-MoR Small attains 74.6% Top-1 accuracy with 140M FLOPs and 5.7M parameters, outperforming EfficientViT-M2 (70.8%) and SHViT-S1 (72.8%) at superior throughput. The Vision-MoR X-Large variant achieves 80.4% Top-1 and 95.2% Top-5 accuracy using 14.3M parameters and 2044M FLOPs, surpassing ResNet-50 and EfficientNet-B1. On COCO object detection, Vision-MoR X-Large yields 39.1 AP with the lowest latency among comparable models. These results underscore Vision-MoR's state-of-the-art accuracy-efficiency trade-offs, positioning it as a scalable, deployment-friendly backbone for real-time vision applications.

NeurIPS Conference 2025 Conference Paper

Analytic Energy-Guided Policy Optimization for Offline Reinforcement Learning

  • Jifeng Hu
  • Sili Huang
  • Zhejian Yang
  • Shengchao Hu
  • Li Shen
  • Hechang Chen
  • Lichao Sun
  • Yi Chang

Conditional decision generation with diffusion models has shown powerful competitiveness in reinforcement learning (RL). Recent studies reveal the relation between energy-function-guidance diffusion models and constrained RL problems. The main challenge lies in estimating the intermediate energy, which is intractable due to the log-expectation formulation during the generation process. To address this issue, we propose the Analytic Energy-guided Policy Optimization (AEPO). Specifically, we first provide a theoretical analysis and the closed-form solution of the intermediate guidance when the diffusion model obeys the conditional Gaussian transformation. Then, we analyze the posterior Gaussian distribution in the log-expectation formulation and obtain the target estimation of the log-expectation under mild assumptions. Finally, we train an intermediate energy neural network to approach the target estimation of log-expectation formulation. We apply our method in 30+ offline RL tasks to demonstrate the effectiveness of our method. Extensive experiments illustrate that our method surpasses numerous representative baselines in D4RL offline reinforcement learning benchmarks.

NeurIPS Conference 2025 Conference Paper

BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization

  • Xueyang Zhou
  • Guiyao Tie
  • Guowen Zhang
  • Hecheng Wang
  • Pan Zhou
  • Lichao Sun

Vision-Language-Action (VLA) models have advanced robotic control by enabling end-to-end decision-making directly from multimodal inputs. However, their tightly coupled architectures expose novel security vulnerabilities. Unlike traditional adversarial perturbations, backdoor attacks represent a stealthier, persistent, and practically significant threat—particularly under the emerging Training-as-a-Service paradigm—but remain largely unexplored in the context of VLA models. To address this gap, we propose BadVLA, a backdoor attack method based on Objective-Decoupled Optimization, which for the first time exposes the backdoor vulnerabilities of VLA models. Specifically, it consists of a two-stage process: (1) explicit feature-space separation to isolate trigger representations from benign inputs, and (2) conditional control deviations that activate only in the presence of the trigger, while preserving clean-task performance. Empirical results on multiple VLA benchmarks demonstrate that BadVLA consistently achieves near-100\% attack success rates with minimal impact on clean task accuracy. Further analyses confirm its robustness against common input perturbations, task transfers, and model fine-tuning, underscoring critical security vulnerabilities in current VLA deployments. Our work offers the first systematic investigation of backdoor vulnerabilities in VLA models, highlighting an urgent need for secure and trustworthy embodied model design practices.

NeurIPS Conference 2025 Conference Paper

Can LLMs Correct Themselves? A Benchmark of Self-Correction in LLMs

  • Guiyao Tie
  • Zenghui Yuan
  • Zeli Zhao
  • Chaoran Hu
  • Tianhe Gu
  • Ruihang Zhang
  • Sizhe Zhang
  • Junran Wu

Self-correction of large language models (LLMs) emerges as a critical component for enhancing their reasoning performance. Although various self-correction methods have been proposed, a comprehensive evaluation of these methods remains largely unexplored, and the question of whether LLMs can truly correct themselves is a matter of significant interest and concern. In this study, we introduce CorrectBench, a benchmark developed to evaluate the effectiveness of self-correction strategies, including intrinsic, external, and fine-tuned approaches, across three tasks: commonsense reasoning, mathematical reasoning, and code generation. Our findings reveal that: 1) Self-correction methods can improve accuracy, especially for complex reasoning tasks; 2) Mixing different self-correction strategies yields further improvements, though it reduces efficiency; 3) Reasoning LLMs (e. g. , DeepSeek-V3) have limited optimization under additional self-correction methods and have high time costs. Interestingly, a comparatively simple chain-of-thought (CoT) baseline demonstrates competitive accuracy and efficiency. These results underscore the potential of self-correction to enhance LLM's reasoning performance while highlighting the ongoing challenge of improving their efficiency. Consequently, we advocate for further research focused on optimizing the balance between reasoning capabilities and operational efficiency.

NeurIPS Conference 2025 Conference Paper

First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training

  • Lai Wei
  • Yuting Li
  • Chen Wang
  • Yue Wang
  • Linghe Kong
  • Weiran Huang
  • Lichao Sun

Improving Multi-modal Large Language Models (MLLMs) in the post-training stage typically relies on supervised fine-tuning (SFT) or reinforcement learning (RL), which require expensive and manually annotated multi-modal data--an ultimately unsustainable resource. This limitation has motivated a growing interest in unsupervised paradigms as a third stage of post-training after SFT and RL. While recent efforts have explored this direction, their methods are complex and difficult to iterate. To address this, we propose MM-UPT, a simple yet effective framework for unsupervised post-training of MLLMs, enabling continual self-improvement without any external supervision. The training method of MM-UPT builds upon GRPO, replacing traditional reward signals with a self-rewarding mechanism based on majority voting over multiple sampled responses. Our experiments demonstrate that such training method effectively improves the reasoning ability of Qwen2. 5-VL-7B (e. g. , 66. 3\%$\rightarrow$72. 9\% on MathVista, 62. 9\%$\rightarrow$68. 7\% on We-Math), using standard dataset without ground truth labels. To further explore scalability, we extend our framework to a data self-generation setting, designing two strategies that prompt the MLLM to synthesize new training samples on its own. Additional experiments show that combining these synthetic data with the unsupervised training method can also boost performance, highlighting a promising approach for scalable self-improvement. Overall, MM-UPT offers a new paradigm for autonomous enhancement of MLLMs, serving as a critical third step after initial SFT and RL in the absence of external supervision. Our code is available at \url{https: //github. com/waltonfuture/MM-UPT}.

NeurIPS Conference 2025 Conference Paper

Tackling Continual Offline RL through Selective Weights Activation on Aligned Spaces

  • Jifeng Hu
  • Sili Huang
  • Li Shen
  • Zhejian Yang
  • Shengchao Hu
  • Shisong Tang
  • Hechang Chen
  • Lichao Sun

Continual offline reinforcement learning (CORL) has shown impressive ability in diffusion-based continual learning systems by modeling the joint distributions of trajectories. However, most research only focuses on limited continual task settings where the tasks have the same observation and action space, which deviates from the realistic demands of training agents in various environments. In view of this, we propose Vector-Quantized Continual Diffuser, named VQ-CD, to break the barrier of different spaces between various tasks. Specifically, our method contains two complementary sections, where the quantization spaces alignment provides a unified basis for the selective weights activation. In the quantized spaces alignment, we leverage vector quantization to align the different state and action spaces of various tasks, facilitating continual training in the same space. Then, we propose to leverage a unified diffusion model attached by the inverse dynamic model to master all tasks by selectively activating different weights according to the task-related sparse masks. Finally, we conduct extensive experiments on 15 continual learning (CL) tasks, including conventional CL task settings (identical state and action spaces) and general CL task settings (various state and action spaces). Compared with 17 baselines, our method reaches the SOTA performance.

NeurIPS Conference 2025 Conference Paper

Towards Building Model/Prompt-Transferable Attackers against Large Vision-Language Models

  • Xiaowen Cai
  • Daizong Liu
  • Xiaoye Qu
  • Xiang Fang
  • Jianfeng Dong
  • Keke Tang
  • Pan Zhou
  • Lichao Sun

Although Large Vision-Language Models (LVLMs) exhibit impressive multimodal capabilities, their vulnerability to adversarial examples has raised serious security concerns. Existing LVLM attackers simply optimize adversarial images that easily overfit a certain model/prompt, making them ineffective once they are transferred to attack a different model/prompt. Motivated by this research gap, this paper aims to develop a more powerful attack that is transferable to black-box LVLM models of different structures and task-aware prompts of different semantics. Specifically, we introduce a new perspective of information theory to investigate LVLMs' transferable characteristics by exploring the relative dependence between outputs of the LVLM model and input adversarial samples. Our empirical observations suggest that enlarging/decreasing the mutual information between outputs and the disentangled adversarial/benign patterns of input images helps to generate more agnostic perturbations for misleading LVLMs' perception with better transferability. In particular, we formulate the complicated calculation of information gain as an estimation problem and incorporate such informative constraints into the adversarial learning process. Extensive experiments on various LVLM models/prompts demonstrate our significant transfer-attack performance.

NeurIPS Conference 2024 Conference Paper

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

  • Sili Huang
  • Jifeng Hu
  • Zhejian Yang
  • Liwei Yang
  • Tao Luo
  • Hechang Chen
  • Lichao Sun
  • Bo Yang

Recent works have shown the remarkable superiority of transformer models in reinforcement learning (RL), where the decision-making problem is formulated as sequential generation. Transformer-based agents could emerge with self-improvement in online environments by providing task contexts, such as multiple trajectories, called in-context RL. However, due to the quadratic computation complexity of attention in transformers, current in-context RL methods suffer from huge computational costs as the task horizon increases. In contrast, the Mamba model is renowned for its efficient ability to process long-term dependencies, which provides an opportunity for in-context RL to solve tasks that require long-term memory. To this end, we first implement Decision Mamba (DM) by replacing the backbone of Decision Transformer (DT). Then, we propose a Decision Mamba-Hybrid (DM-H) with the merits of transformers and Mamba in high-quality prediction and long-term memory. Specifically, DM-H first generates high-value sub-goals from long-term memory through the Mamba model. Then, we use sub-goals to prompt the transformer, establishing high-quality predictions. Experimental results demonstrate that DM-H achieves state-of-the-art in long and short-term tasks, such as D4RL, Grid World, and Tmaze benchmarks. Regarding efficiency, the online testing of DM-H in the long-term task is 28$\times$ times faster than the transformer-based baselines.

NeurIPS Conference 2024 Conference Paper

GTBench: Uncovering the Strategic Reasoning Capabilities of LLMs via Game-Theoretic Evaluations

  • Jinhao Duan
  • Renming Zhang
  • James Diffenderfer
  • Bhavya Kailkhura
  • Lichao Sun
  • Elias Stengel-Eskin
  • Mohit Bansal
  • Tianlong Chen

As Large Language Models (LLMs) are integrated into critical real-world applications, their strategic and logical reasoning abilities are increasingly crucial. This paper evaluates LLMs' reasoning abilities in competitive environments through game-theoretic tasks, e. g. , board and card games that require pure logic and strategic reasoning to compete with opponents. We first propose GTBench, a language-driven environment composing 10 widely-recognized tasks, across a comprehensive game taxonomy: complete versus incomplete information, dynamic versus static, and probabilistic versus deterministic scenarios. Then, we (1) Characterize the game-theoretic reasoning of LLMs; and (2) Perform LLM-vs. -LLM competitions as reasoning evaluation. We observe that (1) LLMs have distinct behaviors regarding various gaming scenarios; for example, LLMs fail in complete and deterministic games yet they are competitive in probabilistic gaming scenarios; (2) Most open-source LLMs, e. g. , CodeLlama-34b-Instruct and Llama-2-70b-chat, are less competitive than commercial LLMs, e. g. , GPT-4, in complex games, yet the recently released Llama-3-70b-Instruct makes up for this shortcoming. In addition, code-pretraining greatly benefits strategic reasoning, while advanced reasoning methods such as Chain-of-Thought (CoT) and Tree-of-Thought (ToT) do not always help. We further characterize the game-theoretic properties of LLMs, such as equilibrium and Pareto Efficiency in repeated games. Detailed error profiles are provided for a better understanding of LLMs' behavior. We hope our research provides standardized protocols and serves as a foundation to spur further explorations in the strategic reasoning of LLMs.

NeurIPS Conference 2024 Conference Paper

HonestLLM: Toward an Honest and Helpful Large Language Model

  • Chujie Gao
  • Siyuan Wu
  • Yue Huang
  • Dongping Chen
  • Qihui Zhang
  • Zhengyan Fu
  • Yao Wan
  • Lichao Sun

Large Language Models (LLMs) have achieved remarkable success across various industries and applications, owing to their exceptional generative capabilities. Nevertheless, honesty and helpfulness, which ensure safe and useful real-world deployments, have been considered as the longstanding cornerstones in practice. In this paper, we first established comprehensive principles for honesty LLM and further created the HoneSet with 930 queries across six categories, which is designed to evaluate LLMs’ ability to maintain honesty. Then, we improved the honesty and helpfulness of LLMs in both training-free and fine-tuning settings. Specifically, we propose a training-free method named Curiosity-Driven Prompting, which enables LLMs to express their internal confusion and uncertainty about the given query and then optimize their responses. Moreover, we also propose a two-stage fine-tuning approach, inspired by curriculum learning, to enhance the honesty and helpfulness of LLMs. The method first teaches LLMs to distinguish between honest and dishonest, and then LLMs are trained to learn to respond more helpfully. Experimental results demonstrated that both of the two proposed methods improve the helpfulness of LLMs while making them maintain honesty. Our research has paved the way for more reliable and trustworthy LLMs in real-world applications.

NeurIPS Conference 2024 Conference Paper

Pandora's Box: Towards Building Universal Attackers against Real-World Large Vision-Language Models

  • Daizong Liu
  • Mingyu Yang
  • Xiaoye Qu
  • Pan Zhou
  • Xiang Fang
  • Keke Tang
  • Yao Wan
  • Lichao Sun

Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding tasks. Nevertheless, these models are susceptible to adversarial examples. In real-world applications, existing LVLM attackers generally rely on the detailed prior knowledge of the model to generate effective perturbations. Moreover, these attacks are task-specific, leading to significant costs for designing perturbation. Motivated by the research gap and practical demands, in this paper, we make the first attempt to build a universal attacker against real-world LVLMs, focusing on two critical aspects: (i) restricting access to only the LVLM inputs and outputs. (ii) devising a universal adversarial patch, which is task-agnostic and can deceive any LVLM-driven task when applied to various inputs. Specifically, we start by initializing the location and the pattern of the adversarial patch through random sampling, guided by the semantic distance between their output and the target label. Subsequently, we maintain a consistent patch location while refining the pattern to enhance semantic resemblance to the target. In particular, our approach incorporates a diverse set of LVLM task inputs as query samples to approximate the patch gradient, capitalizing on the importance of distinct inputs. In this way, the optimized patch is universally adversarial against different tasks and prompts, leveraging solely gradient estimates queried from the model. Extensive experiments are conducted to verify the strong universal adversarial capabilities of our proposed attack with prevalent LVLMs including LLaVA, MiniGPT-4, Flamingo, and BLIP-2, spanning a spectrum of tasks, all achieved without delving into the details of the model structures.

JMLR Journal 2024 Journal Article

PyGOD: A Python Library for Graph Outlier Detection

  • Kay Liu
  • Yingtong Dou
  • Xueying Ding
  • Xiyang Hu
  • Ruitong Zhang
  • Hao Peng
  • Lichao Sun
  • Philip S. Yu

PyGOD is an open-source Python library for detecting outliers in graph data. As the first comprehensive library of its kind, PyGOD supports a wide array of leading graph-based methods for outlier detection under an easy-to-use, well-documented API designed for use by both researchers and practitioners. PyGOD provides modularized components of the different detectors implemented so that users can easily customize each detector for their purposes. To ease the construction of detection workflows, PyGOD offers numerous commonly used utility functions. To scale computation to large graphs, PyGOD supports functionalities for deep models such as sampling and mini-batch processing. PyGOD uses best practices in fostering code reliability and maintainability, including unit testing, continuous integration, and code coverage. To facilitate accessibility, PyGOD is released under a BSD 2-Clause license at https://pygod.org and at the Python Package Index (PyPI). [abs] [ pdf ][ bib ] [ code ] &copy JMLR 2024. ( edit, beta )

AAAI Conference 2024 Conference Paper

Revisiting Gradient Pruning: A Dual Realization for Defending against Gradient Attacks

  • Lulu Xue
  • Shengshan Hu
  • Ruizhi Zhao
  • Leo Yu Zhang
  • Shengqing Hu
  • Lichao Sun
  • Dezhong Yao

Collaborative learning (CL) is a distributed learning framework that aims to protect user privacy by allowing users to jointly train a model by sharing their gradient updates only. However, gradient inversion attacks (GIAs), which recover users' training data from shared gradients, impose severe privacy threats to CL. Existing defense methods adopt different techniques, e.g., differential privacy, cryptography, and perturbation defenses, to defend against the GIAs. Nevertheless, all current defense methods suffer from a poor trade-off between privacy, utility, and efficiency. To mitigate the weaknesses of existing solutions, we propose a novel defense method, Dual Gradient Pruning (DGP), based on gradient pruning, which can improve communication efficiency while preserving the utility and privacy of CL. Specifically, DGP slightly changes gradient pruning with a stronger privacy guarantee. And DGP can also significantly improve communication efficiency with a theoretical analysis of its convergence and generalization. Our extensive experiments show that DGP can effectively defend against the most powerful GIAs and reduce the communication cost without sacrificing the model's utility.

AAAI Conference 2024 Conference Paper

Stable Unlearnable Example: Enhancing the Robustness of Unlearnable Examples via Stable Error-Minimizing Noise

  • Yixin Liu
  • Kaidi Xu
  • Xun Chen
  • Lichao Sun

The open sourcing of large amounts of image data promotes the development of deep learning techniques. Along with this comes the privacy risk of these image datasets being exploited by unauthorized third parties to train deep learning models for commercial or illegal purposes. To avoid the abuse of data, a poisoning-based technique, "unlearnable example", has been proposed to significantly degrade the generalization performance of models by adding imperceptible noise to the data. To further enhance its robustness against adversarial training, existing works leverage iterative adversarial training on both the defensive noise and the surrogate model. However, it still remains unknown whether the robustness of unlearnable examples primarily comes from the effect of enhancement in the surrogate model or the defensive noise. Observing that simply removing the adversarial perturbation on the training process of the defensive noise can improve the performance of robust unlearnable examples, we identify that solely the surrogate model's robustness contributes to the performance. Furthermore, we found a negative correlation exists between the robustness of defensive noise and the protection performance, indicating defensive noise's instability issue. Motivated by this, to further boost the robust unlearnable example, we introduce Stable Error-Minimizing noise (SEM), which trains the defensive noise against random perturbation instead of the time-consuming adversarial perturbation to improve the stability of defensive noise. Through comprehensive experiments, we demonstrate that SEM achieves a new state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet Subset regarding both effectiveness and efficiency.

NeurIPS Conference 2023 Conference Paper

A Comprehensive Study on Text-attributed Graphs: Benchmarking and Rethinking

  • Hao Yan
  • Chaozhuo Li
  • Ruosong Long
  • Chao Yan
  • Jianan Zhao
  • Wenwen Zhuang
  • Jun Yin
  • Peiyan Zhang

Text-attributed graphs (TAGs) are prevalent in various real-world scenarios, where each node is associated with a text description. The cornerstone of representation learning on TAGs lies in the seamless integration of textual semantics within individual nodes and the topological connections across nodes. Recent advancements in pre-trained language models (PLMs) and graph neural networks (GNNs) have facilitated effective learning on TAGs, garnering increased research interest. However, the absence of meaningful benchmark datasets and standardized evaluation procedures for TAGs has impeded progress in this field. In this paper, we propose CS-TAG, a comprehensive and diverse collection of challenging benchmark datasets for TAGs. The CS-TAG datasets are notably large in scale and encompass a wide range of domains, spanning from citation networks to purchase graphs. In addition to building the datasets, we conduct extensive benchmark experiments over CS-TAG with various learning paradigms, including PLMs, GNNs, PLM-GNN co-training methods, and the proposed novel topological pre-training of language models. In a nutshell, we provide an overview of the CS-TAG datasets, standardized evaluation procedures, and present baseline experiments. The entire CS-TAG project is publicly accessible at \url{https: //github. com/sktsherlock/TAG-Benchmark}.

NeurIPS Conference 2023 Conference Paper

Learning Generalizable Agents via Saliency-guided Features Decorrelation

  • Sili Huang
  • Yanchao Sun
  • Jifeng Hu
  • Siyuan Guo
  • Hechang Chen
  • Yi Chang
  • Lichao Sun
  • Bo Yang

In visual-based Reinforcement Learning (RL), agents often struggle to generalize well to environmental variations in the state space that were not observed during training. The variations can arise in both task-irrelevant features, such as background noise, and task-relevant features, such as robot configurations, that are related to the optimal decisions. To achieve generalization in both situations, agents are required to accurately understand the impact of changed features on the decisions, i. e. , establishing the true associations between changed features and decisions in the policy model. However, due to the inherent correlations among features in the state space, the associations between features and decisions become entangled, making it difficult for the policy to distinguish them. To this end, we propose Saliency-Guided Features Decorrelation (SGFD) to eliminate these correlations through sample reweighting. Concretely, SGFD consists of two core techniques: Random Fourier Functions (RFF) and the saliency map. RFF is utilized to estimate the complex non-linear correlations in high-dimensional images, while the saliency map is designed to identify the changed features. Under the guidance of the saliency map, SGFD employs sample reweighting to minimize the estimated correlations related to changed features, thereby achieving decorrelation in visual RL tasks. Our experimental results demonstrate that SGFD can generalize well on a wide range of test environments and significantly outperforms state-of-the-art methods in handling both task-irrelevant variations and task-relevant variations.

AAAI Conference 2023 Conference Paper

PointCA: Evaluating the Robustness of 3D Point Cloud Completion Models against Adversarial Examples

  • Shengshan Hu
  • Junwei Zhang
  • Wei Liu
  • Junhui Hou
  • Minghui Li
  • Leo Yu Zhang
  • Hai Jin
  • Lichao Sun

Point cloud completion, as the upstream procedure of 3D recognition and segmentation, has become an essential part of many tasks such as navigation and scene understanding. While various point cloud completion models have demonstrated their powerful capabilities, their robustness against adversarial attacks, which have been proven to be fatally malicious towards deep neural networks, remains unknown. In addition, existing attack approaches towards point cloud classifiers cannot be applied to the completion models due to different output forms and attack purposes. In order to evaluate the robustness of the completion models, we propose PointCA, the first adversarial attack against 3D point cloud completion models. PointCA can generate adversarial point clouds that maintain high similarity with the original ones, while being completed as another object with totally different semantic information. Specifically, we minimize the representation discrepancy between the adversarial example and the target point set to jointly explore the adversarial point clouds in the geometry space and the feature space. Furthermore, to launch a stealthier attack, we innovatively employ the neighbourhood density information to tailor the perturbation constraint, leading to geometry-aware and distribution-adaptive modifications for each point. Extensive experiments against different premier point cloud completion networks show that PointCA can cause the performance degradation from 77.9% to 16.7%, with the structure chamfer distance kept below 0.01. We conclude that existing completion models are severely vulnerable to adversarial examples, and state-of-the-art defenses for point cloud classification will be partially invalid when applied to incomplete and uneven point cloud data.

IJCAI Conference 2023 Conference Paper

RAIN: RegulArization on Input and Network for Black-Box Domain Adaptation

  • Qucheng Peng
  • Zhengming Ding
  • Lingjuan Lyu
  • Lichao Sun
  • Chen Chen

Source-Free domain adaptation transits the source-trained model towards target domain without exposing the source data, trying to dispel these concerns about data privacy and security. However, this paradigm is still at risk of data leakage due to adversarial attacks on the source model. Hence, the Black-Box setting only allows to use the outputs of source model, but still suffers from overfitting on the source domain more severely due to source model's unseen weights. In this paper, we propose a novel approach named RAIN (RegulArization on Input and Network) for Black-Box domain adaptation from both input-level and network-level regularization. For the input-level, we design a new data augmentation technique as Phase MixUp, which highlights task-relevant objects in the interpolations, thus enhancing input-level regularization and class consistency for target models. For network-level, we develop a Subnetwork Distillation mechanism to transfer knowledge from the target subnetwork to the full target network via knowledge distillation, which thus alleviates overfitting on the source domain by learning diverse target representations. Extensive experiments show that our method achieves state-of-the-art performance on several cross-domain benchmarks under both single- and multi-source black-box domain adaptation.

NeurIPS Conference 2023 Conference Paper

Scalable Fair Influence Maximization

  • Xiaobin Rui
  • Zhixiao Wang
  • Jiayu Zhao
  • Lichao Sun
  • Wei Chen

Given a graph $G$, a community structure $\mathcal{C}$, and a budget $k$, the fair influence maximization problem aims to select a seed set $S$ ($|S|\leq k$) that maximizes the influence spread while narrowing the influence gap between different communities. While various fairness notions exist, the welfare fairness notion, which balances fairness level and influence spread, has shown promising effectiveness. However, the lack of efficient algorithms for optimizing the welfare fairness objective function restricts its application to small-scale networks with only a few hundred nodes. In this paper, we adopt the objective function of welfare fairness to maximize the exponentially weighted summation over the influenced fraction of all communities. We first introduce an unbiased estimator for the fractional power of the arithmetic mean. Then, by adapting the reverse influence sampling (RIS) approach, we convert the optimization problem to a weighted maximum coverage problem. We also analyze the number of reverse reachable sets needed to approximate the fair influence at a high probability. Further, we present an efficient algorithm that guarantees $1-1/e - \varepsilon$ approximation.

AAAI Conference 2023 Conference Paper

SEAT: Stable and Explainable Attention

  • Lijie Hu
  • Yixin Liu
  • Ninghao Liu
  • Mengdi Huai
  • Lichao Sun
  • Di Wang

Attention mechanism has become a standard fixture in many state-of-the-art natural language processing (NLP) models, not only due to its outstanding performance, but also because it provides plausible innate explanations for neural architectures. However, recent studies show that attention is unstable against randomness and perturbations during training or testing, such as random seeds and slight perturbation of embeddings, which impedes it from being a faithful explanation tool. Thus, a natural question is whether we can find an alternative to vanilla attention, which is more stable and could keep the key characteristics of the explanation. In this paper, we provide a rigorous definition of such an attention method named SEAT (Stable and Explainable ATtention). Specifically, SEAT has the following three properties: (1) Its prediction distribution is close to the prediction of the vanilla attention; (2) Its top-k indices largely overlap with those of the vanilla attention; (3) It is robust w.r.t perturbations, i.e., any slight perturbation on SEAT will not change the attention and prediction distribution too much, which implicitly indicates that it is stable to randomness and perturbations. Furthermore, we propose an optimization method for obtaining SEAT, which could be considered as revising the vanilla attention. Finally, through intensive experiments on various datasets, we compare our SEAT with other baseline methods using RNN, BiLSTM and BERT architectures, with different evaluation metrics on model interpretation, stability and accuracy. Results show that, besides preserving the original explainability and model performance, SEAT is more stable against input perturbations and training randomness, which indicates it is a more faithful explanation.

AAAI Conference 2023 Conference Paper

Tackling Data Heterogeneity in Federated Learning with Class Prototypes

  • Yutong Dai
  • Zeyuan Chen
  • Junnan Li
  • Shelby Heinecke
  • Lichao Sun
  • Ran Xu

Data heterogeneity across clients in federated learning (FL) settings is a widely acknowledged challenge. In response, personalized federated learning (PFL) emerged as a framework to curate local models for clients' tasks. In PFL, a common strategy is to develop local and global models jointly - the global model (for generalization) informs the local models, and the local models (for personalization) are aggregated to update the global model. A key observation is that if we can improve the generalization ability of local models, then we can improve the generalization of global models, which in turn builds better personalized models. In this work, we consider class imbalance, an overlooked type of data heterogeneity, in the classification setting. We propose FedNH, a novel method that improves the local models' performance for both personalization and generalization by combining the uniformity and semantics of class prototypes. FedNH initially distributes class prototypes uniformly in the latent space and smoothly infuses the class semantics into class prototypes. We show that imposing uniformity helps to combat prototype collapse while infusing class semantics improves local models. Extensive experiments were conducted on popular classification datasets under the cross-device setting. Our results demonstrate the effectiveness and stability of our method over recent works.

TIST Journal 2022 Journal Article

A Survey on Text Classification: From Traditional to Deep Learning

  • Qian Li
  • Hao Peng
  • Jianxin Li
  • Congying Xia
  • Renyu Yang
  • Lichao Sun
  • Philip S. Yu
  • Lifang He

Text classification is the most fundamental and essential task in natural language processing. The last decade has seen a surge of research in this area due to the unprecedented success of deep learning. Numerous methods, datasets, and evaluation metrics have been proposed in the literature, raising the need for a comprehensive and updated survey. This paper fills the gap by reviewing the state-of-the-art approaches from 1961 to 2021, focusing on models from traditional models to deep learning. We create a taxonomy for text classification according to the text involved and the models used for feature extraction and classification. We then discuss each of these categories in detail, dealing with both the technical developments and benchmark datasets that support tests of predictions. A comprehensive comparison between different techniques, as well as identifying the pros and cons of various evaluation metrics are also provided in this survey. Finally, we conclude by summarizing key implications, future research directions, and the challenges facing the research area.

NeurIPS Conference 2022 Conference Paper

BOND: Benchmarking Unsupervised Outlier Node Detection on Static Attributed Graphs

  • Kay Liu
  • Yingtong Dou
  • Yue Zhao
  • Xueying Ding
  • Xiyang Hu
  • Ruitong Zhang
  • Kaize Ding
  • Canyu Chen

Detecting which nodes in graphs are outliers is a relatively new machine learning task with numerous applications. Despite the proliferation of algorithms developed in recent years for this task, there has been no standard comprehensive setting for performance evaluation. Consequently, it has been difficult to understand which methods work well and when under a broad range of settings. To bridge this gap, we present—to the best of our knowledge—the first comprehensive benchmark for unsupervised outlier node detection on static attributed graphs called BOND, with the following highlights. (1) We benchmark the outlier detection performance of 14 methods ranging from classical matrix factorization to the latest graph neural networks. (2) Using nine real datasets, our benchmark assesses how the different detection methods respond to two major types of synthetic outliers and separately to “organic” (real non-synthetic) outliers. (3) Using an existing random graph generation technique, we produce a family of synthetically generated datasets of different graph sizes that enable us to compare the running time and memory usage of the different outlier detection algorithms. Based on our experimental results, we discuss the pros and cons of existing graph outlier detection algorithms, and we highlight opportunities for future research. Importantly, our code is freely available and meant to be easily extendable: https: //github. com/pygod-team/pygod/tree/main/benchmark

IJCAI Conference 2022 Conference Paper

Data-Free Adversarial Knowledge Distillation for Graph Neural Networks

  • Yuanxin Zhuang
  • Lingjuan Lyu
  • Chuan Shi
  • Carl Yang
  • Lichao Sun

Graph neural networks (GNNs) have been widely used in modeling graph structured data, owing to its impressive performance in a wide range of practical applications. Recently, knowledge distillation (KD) for GNNs has enabled remarkable progress in graph model compression and knowledge transfer. However, most of the existing KD methods require a large volume of real data, which are not readily available in practice, and may preclude their applicability in scenarios where the teacher model is trained on rare or hard to acquire datasets. To address this problem, we propose the first end-to-end framework for data-free adversarial knowledge distillation on graph structured data (DFAD-GNN). To be specific, our DFAD-GNN employs a generative adversarial network, which mainly consists of three components: a pre-trained teacher model and a student model are regarded as two discriminators, and a generator is utilized for deriving training graphs to distill knowledge from the teacher model into the student model. Extensive experiments on various benchmark models and six representative datasets demonstrate that our DFAD-GNN significantly surpasses state-of-the-art data-free baselines in the graph classification task.

NeurIPS Conference 2022 Conference Paper

Distributional Reward Estimation for Effective Multi-agent Deep Reinforcement Learning

  • Jifeng Hu
  • Yanchao Sun
  • Hechang Chen
  • Sili Huang
  • Haiyin Piao
  • Yi Chang
  • Lichao Sun

Multi-agent reinforcement learning has drawn increasing attention in practice, e. g. , robotics and automatic driving, as it can explore optimal policies using samples generated by interacting with the environment. However, high reward uncertainty still remains a problem when we want to train a satisfactory model, because obtaining high-quality reward feedback is usually expensive and even infeasible. To handle this issue, previous methods mainly focus on passive reward correction. At the same time, recent active reward estimation methods have proven to be a recipe for reducing the effect of reward uncertainty. In this paper, we propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL). Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training. Specifically, we design the multi-action-branch reward estimation to model reward distributions on all action branches. Then we utilize reward aggregation to obtain stable updating signals during training. Our intuition is that consideration of all possible consequences of actions could be useful for learning policies. The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.

TIST Journal 2022 Journal Article

FedBERT: When Federated Learning Meets Pre-training

  • Yuanyishu Tian
  • Yao Wan
  • Lingjuan Lyu
  • Dezhong Yao
  • Hai Jin
  • Lichao Sun

The fast growth of pre-trained models (PTMs) has brought natural language processing to a new era, which has become a dominant technique for various natural language processing (NLP) applications. Every user can download the weights of PTMs, then fine-tune the weights for a task on the local side. However, the pre-training of a model relies heavily on accessing a large-scale of training data and requires a vast amount of computing resources. These strict requirements make it impossible for any single client to pre-train such a model. To grant clients with limited computing capability to participate in pre-training a large model, we propose a new learning approach, FedBERT, that takes advantage of the federated learning and split learning approaches, resorting to pre-training BERT in a federated way. FedBERT can prevent sharing the raw data information and obtain excellent performance. Extensive experiments on seven GLUE tasks demonstrate that FedBERT can maintain its effectiveness without communicating to the sensitive local data of clients.

TIST Journal 2022 Journal Article

Federated Multi-view Learning for Private Medical Data Integration and Analysis

  • Sicong Che
  • Zhaoming Kong
  • Hao Peng
  • Lichao Sun
  • Alex Leow
  • Yong Chen
  • Lifang He

Along with the rapid expansion of information technology and digitalization of health data, there is an increasing concern on maintaining data privacy while garnering the benefits in the medical field. Two critical challenges are identified: First, medical data is naturally distributed across multiple local sites, making it difficult to collectively train machine learning models without data leakage. Second, in medical applications, data are often collected from different sources and views, resulting in heterogeneity and complexity that requires reconciliation. In this article, we present a generic Federated Multi-view Learning (FedMV) framework for multi-view data leakage prevention. Specifically, we apply this framework to two types of problems based on local data availability: Vertical Federated Multi-view Learning (V-FedMV) and Horizontal Federated Multi-view Learning (H-FedMV). We experimented with real-world keyboard data collected from BiAffect study. Our results demonstrated that the proposed approach can make full use of multi-view data in a privacy-preserving way, and both V-FedMV and H-FedMV perform better than their single-view and pairwise counterparts. Besides, the framework can be easily adapted to deal with multi-view sequential data. We have developed a sequential model (S-FedMV) that takes sequence of multi-view data as input and demonstrated it experimentally. To the best of our knowledge, this framework is the first to consider both vertical and horizontal diversification in the multi-view setting, as well as their sequential federated learning.

IJCAI Conference 2022 Conference Paper

Membership Inference via Backdooring

  • Hongsheng Hu
  • Zoran Salčić
  • Gillian Dobbie
  • Jinjun Chen
  • Lichao Sun
  • Xuyun Zhang

Recently issued data privacy regulations like GDPR (General Data Protection Regulation) grant individuals the right to be forgotten. In the context of machine learning, this requires a model to forget about a training data sample if requested by the data owner (i. e. , machine unlearning). As an essential step prior to machine unlearning, it is still a challenge for a data owner to tell whether or not her data have been used by an unauthorized party to train a machine learning model. Membership inference is a recently emerging technique to identify whether a data sample was used to train a target model, and seems to be a promising solution to this challenge. However, straightforward adoption of existing membership inference approaches fails to address the challenge effectively due to being originally designed for attacking membership privacy and suffering from several severe limitations such as low inference accuracy on well-generalized models. In this paper, we propose a novel membership inference approach inspired by the backdoor technology to address the said challenge. Specifically, our approach of Membership Inference via Backdooring (MIB) leverages the key observation that a backdoored model behaves very differently from a clean model when predicting on deliberately marked samples created by a data owner. Appealingly, MIB requires data owners' marking a small number of samples for membership inference and only black-box access to the target model, with theoretical guarantees for inference results. We perform extensive experiments on various datasets and deep neural network architectures, and the results validate the efficacy of our approach, e. g. , marking only 0. 1% of the training dataset is practically sufficient for effective membership inference.

AAAI Conference 2022 Conference Paper

SplitFed: When Federated Learning Meets Split Learning

  • Chandra Thapa
  • Pathum Chamikara Mahawaga Arachchige
  • Seyit Camtepe
  • Lichao Sun

Federated learning (FL) and split learning (SL) are two popular distributed machine learning approaches. Both follow a model-to-data scenario; clients train and test machine learning models without sharing raw data. SL provides better model privacy than FL due to the machine learning model architecture split between clients and the server. Moreover, the split model makes SL a better option for resource-constrained environments. However, SL performs slower than FL due to the relay-based training across multiple clients. In this regard, this paper presents a novel approach, named splitfed learning (SFL), that amalgamates the two approaches eliminating their inherent drawbacks, along with a refined architectural configuration incorporating differential privacy and PixelDP to enhance data privacy and model robustness. Our analysis and empirical results demonstrate that (pure) SFL provides similar test accuracy and communication efficiency as SL while significantly decreasing its computation time per global epoch than in SL for multiple clients. Furthermore, as in SL, its communication efficiency over FL improves with the number of clients. Besides, the performance of SFL with privacy and robustness measures is further evaluated under extended experimental settings.

IJCAI Conference 2021 Conference Paper

Federated Model Distillation with Noise-Free Differential Privacy

  • Lichao Sun
  • Lingjuan Lyu

Conventional federated learning directly averages model weights, which is only possible for collaboration between models with homogeneous architectures. Sharing prediction instead of weight removes this obstacle and eliminates the risk of white-box inference attacks in conventional federated learning. However, the predictions from local models are sensitive and would leak training data privacy to the public. To address this issue, one naive approach is adding the differentially private random noise to the predictions, which however brings a substantial trade-off between privacy budget and model performance. In this paper, we propose a novel framework called FEDMD-NFDP, which applies a Noise-FreeDifferential Privacy (NFDP) mechanism into a federated model distillation framework. Our extensive experimental results on various datasets validate that FEDMD-NFDP can deliver not only comparable utility and communication efficiency but also provide a noise-free differential privacy guarantee. We also demonstrate the feasibility of our FEDMD-NFDP by considering both IID and Non-IID settings, heterogeneous model architectures, and unlabelled public datasets from a different distribution.

IJCAI Conference 2021 Conference Paper

Graph Entropy Guided Node Embedding Dimension Selection for Graph Neural Networks

  • Gongxu Luo
  • Jianxin Li
  • Hao Peng
  • Carl Yang
  • Lichao Sun
  • Philip S. Yu
  • Lifang He

Graph representation learning has achieved great success in many areas, including e-commerce, chemistry, biology, etc. However, the fundamental problem of choosing the appropriate dimension of node embedding for a given graph still remains unsolved. The commonly used strategies for Node Embedding Dimension Selection (NEDS) based on grid search or empirical knowledge suffer from heavy computation and poor model performance. In this paper, we revisit NEDS from the perspective of minimum entropy principle. Subsequently, we propose a novel Minimum Graph Entropy (MinGE) algorithm for NEDS with graph data. To be specific, MinGE considers both feature entropy and structure entropy on graphs, which are carefully designed according to the characteristics of the rich information in them. The feature entropy, which assumes the embeddings of adjacent nodes to be more similar, connects node features and link topology on graphs. The structure entropy takes the normalized degree as basic unit to further measure the higher-order structure of graphs. Based on them, we design MinGE to directly calculate the ideal node embedding dimension for any graph. Finally, comprehensive experiments with popular Graph Neural Networks (GNNs) on benchmark datasets demonstrate the effectiveness and generalizability of our proposed MinGE.

IJCAI Conference 2021 Conference Paper

LDP-FL: Practical Private Aggregation in Federated Learning with Local Differential Privacy

  • Lichao Sun
  • Jianwei Qian
  • Xun Chen

Training deep learning models on sensitive user data has raised increasing privacy concerns in many areas. Federated learning is a popular approach for privacy protection that collects the local gradient information instead of raw data. One way to achieve a strict privacy guarantee is to apply local differential privacy into federated learning. However, previous works do not give a practical solution due to two issues. First, the range difference of weights in different deep learning model layers has not been explicitly considered when applying local differential privacy mechanism. Second, the privacy budget explodes due to the high dimensionality of weights in deep learning models and many query iterations of federated learning. In this paper, we proposed a novel design of local differential privacy mechanism for federated learning to address the abovementioned issues. It makes the local weights update differentially private by adapting to the varying ranges at different layers of a deep neural network, which introduces a smaller variance of the estimated model weights, especially for deeper models. Moreover, the proposed mechanism bypasses the curse of dimensionality by parameter shuffling aggregation. A series of empirical evaluations on three commonly used datasets in prior differential privacy works, MNIST, Fashion-MNIST and CIFAR-10, demonstrate that our solution can not only achieve superior deep learning performance but also provide a strong privacy guarantee at the same time.

IJCAI Conference 2021 Conference Paper

Secure Deep Graph Generation with Link Differential Privacy

  • Carl Yang
  • Haonan Wang
  • Ke Zhang
  • Liang Chen
  • Lichao Sun

Many data mining and analytical tasks rely on the abstraction of networks (graphs) to summarize relational structures among individuals (nodes). Since relational data are often sensitive, we aim to seek effective approaches to generate utility-preserved yet privacy-protected structured data. In this paper, we leverage the differential privacy (DP) framework to formulate and enforce rigorous privacy constraints on deep graph generation models, with a focus on edge-DP to guarantee individual link privacy. In particular, we enforce edge-DP by injecting designated noise to the gradients of a link reconstruction based graph generation model, while ensuring data utility by improving structure learning with structure-oriented graph discrimination. Extensive experiments on two real-world network datasets show that our proposed DPGGAN model is able to generate graphs with effectively preserved global structure and rigorously protected individual link privacy.

NeurIPS Conference 2021 Conference Paper

Subgraph Federated Learning with Missing Neighbor Generation

  • Ke Zhang
  • Carl Yang
  • Xiaoxiao Li
  • Lichao Sun
  • Siu Ming Yiu

Graphs have been widely used in data mining and machine learning due to their unique representation of real-world objects and their interactions. As graphs are getting bigger and bigger nowadays, it is common to see their subgraphs separately collected and stored in multiple local systems. Therefore, it is natural to consider the subgraph federated learning setting, where each local system holds a small subgraph that may be biased from the distribution of the whole graph. Hence, the subgraph federated learning aims to collaboratively train a powerful and generalizable graph mining model without directly sharing their graph data. In this work, towards the novel yet realistic setting of subgraph federated learning, we propose two major techniques: (1) FedSage, which trains a GraphSage model based on FedAvg to integrate node features, link structures, and task labels on multiple local subgraphs; (2) FedSage+, which trains a missing neighbor generator along FedSage to deal with missing links across local subgraphs. Empirical results on four real-world graph datasets with synthesized subgraph federated learning settings demonstrate the effectiveness and efficiency of our proposed techniques. At the same time, consistent theoretical implications are made towards their generalization ability on the global graphs.

AAAI Conference 2019 Conference Paper

Private Model Compression via Knowledge Distillation

  • Ji Wang
  • Weidong Bao
  • Lichao Sun
  • Xiaomin Zhu
  • Bokai Cao
  • Philip S. Yu

The soaring demand for intelligent mobile applications calls for deploying powerful deep neural networks (DNNs) on mobile devices. However, the outstanding performance of DNNs notoriously relies on increasingly complex models, which in turn is associated with an increase in computational expense far surpassing mobile devices’ capacity. What is worse, app service providers need to collect and utilize a large volume of users’ data, which contain sensitive information, to build the sophisticated DNN models. Directly deploying these models on public mobile devices presents prohibitive privacy risk. To benefit from the on-device deep learning without the capacity and privacy concerns, we design a private model compression framework RONA. Following the knowledge distillation paradigm, we jointly use hint learning, distillation learning, and self learning to train a compact and fast neural network. The knowledge distilled from the cumbersome model is adaptively bounded and carefully perturbed to enforce differential privacy. We further propose an elegant query sample selection method to reduce the number of queries and control the privacy loss. A series of empirical evaluations as well as the implementation on an Android mobile device show that RONA can not only compress cumbersome models efficiently but also provide a strong privacy guarantee. For example, on SVHN, when a meaningful (9. 83, 10−6 )-differential privacy is guaranteed, the compact model trained by RONA can obtain 20× compression ratio and 19× speed-up with merely 0. 97% accuracy loss.