Arrow Research search

Author name cluster

Li Sun

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

41 papers
2 author rows

Possible papers

41

AAAI Conference 2026 Conference Paper

D3-RSMDE: 40× Faster and High-Fidelity Remote Sensing Monocular Depth Estimation

  • Ruizhi Wang
  • Weihan Li
  • Zunlei Feng
  • Haofei Zhang
  • Mingli Song
  • Jiayu Wang
  • Jie Song
  • Li Sun

Real-time, high-fidelity monocular depth estimation from remote sensing imagery is crucial for numerous applications, yet existing methods face a stark trade-off between accuracy and efficiency. Although using Vision Transformer (ViT) backbones for dense prediction is fast, they often exhibit poor perceptual quality. Conversely, diffusion models offer high fidelity but at a prohibitive computational cost. To overcome these limitations, we propose Depth Detail Diffusion for Remote Sensing Monocular Depth Estimation (D³-RSMDE), an efficient framework designed to achieve an optimal balance between speed and quality. Our framework first leverages a ViT-based module to rapidly generate a high-quality preliminary depth map construction, which serves as a structural prior, effectively replacing the time-consuming initial structure generation stage of diffusion models. Based on this prior, we propose a Progressive Linear Blending Refinement (PLBR) strategy, which uses a lightweight U-Net to refine the details in only a few iterations. The entire refinement step operates efficiently in a compact latent space supported by a Variational Autoencoder (VAE). Extensive experiments demonstrate that D³-RSMDE achieves a notable 11.85% reduction in the Learned Perceptual Image Patch Similarity (LPIPS) perceptual metric over leading models like Marigold, while also achieving over a 40× speedup in inference and maintaining VRAM usage comparable to lightweight ViT models.

AAAI Conference 2026 Conference Paper

Hyperbolic Continuous Structural Entropy for Hierarchical Clustering

  • Guangjie Zeng
  • Hao Peng
  • Angsheng Li
  • Li Sun
  • Chunyang Liu
  • Shengze Li
  • Yicheng Pan
  • Philip S. Yu

Hierarchical clustering is a fundamental machine-learning technique for grouping data points into dendrograms. However, existing hierarchical clustering methods encounter two primary challenges: 1) Most methods specify dendrograms without a global objective. 2) Graph-based methods often neglect the significance of graph structure, optimizing objectives on complete or static predefined graphs. In this work, we propose Hyperbolic Continuous Structural Entropy neural networks, namely HypCSE, for structure-enhanced continuous hierarchical clustering. Our key idea is to map data points in the hyperbolic space and minimize the relaxed continuous structural entropy (SE) on structure-enhanced graphs. Specifically, we encode graph vertices in hyperbolic space using hyperbolic graph neural networks and minimize approximate SE defined on graph embeddings. To make the SE objective differentiable for optimization, we reformulate it into a function using the lowest common ancestor (LCA) on trees and then relax it into continuous SE (CSE) by the analogy of hyperbolic graph embeddings and partitioning trees. To ensure a graph structure that effectively captures the hierarchy of data points for CSE calculation, we employ a graph structure learning (GSL) strategy that updates the graph structure during training. Extensive experiments on seven datasets demonstrate the superior performance of HypCSE.

AAAI Conference 2026 Conference Paper

Learning to Explore: Policy-Guided Outlier Synthesis for Graph Out-of-Distribution Detection

  • Li Sun
  • Lanxu Yang
  • Jiayu Tian
  • Bowen Fang
  • Xiaoyan Yu
  • Junda Ye
  • Peng Tang
  • Hao Peng

Detecting Out-of-Distribution (OOD) graphs—those are drawn from a different distribution from the training data-is a critical task for ensuring the safety and reliability of Graph Neural Networks. The main challenge in unsupervised graph-level Out-of-Distribution detection lies in its common reliance on purely in-distribution (ID) data. This ID-only training paradigm leads to an incomplete characterization of the feature space, resulting in decision boundaries that lack the robustness needed to effectively separate ID from OOD samples. While incorporating synthesized outliers into the training process is a promising direction, existing generation methods are limited by their dependence on pre-defined, non-adaptive sampling heuristics (e.g., distance- or density-based). Such fixed strategies lack the flexibility to systematically explore the most informative OOD regions for refining decision boundaries. To overcome this limitation, we propose a novel Policy-Guided Outlier Synthesis (PGOS) framework that replaces static heuristics with a learned, adaptive exploration policy. PGOS trains a reinforcement learning agent to autonomously navigate low-density regions within a structured latent space, sampling representations that are maximally effective for regularizing the OOD decision boundary. These sampled points are then decoded into high-quality pseudo-OOD graphs to enhance the detector's robustness. Extensive experiments demonstrate the strong performance of our method, state-of-the-art results on multiple graph OOD and anomaly detection benchmarks.

AAAI Conference 2026 Conference Paper

RPGen: Robust and Differentially Private Synthetic Image Generation

  • Zihao Wang
  • Hao Peng
  • Wei Dong
  • Yuecen Wei
  • Li Sun
  • Zhengtao Yu

Differentially private (DP) image synthesis enables the generation of realistic images while bounding privacy leakage, facilitating secure data sharing across organizations. However, the Gaussian noise injected during DP training, such as via DP-SGD, often severely degrades synthesis quality by disrupting model convergence. To address this, we introduce RPGen, a novel framework that enhances diffusion models' parameter robustness to mitigate DP noise effects without compromising privacy guarantees. At its core, RPGen employs adversarial model perturbation (AMP) during public pre-training to build resilience against perturbations, but we identify and tackle the critical issue of robustness transferability across domains. RPGen achieves this through a three-step process: (1) A pre-trained classifier infers labels for private images, aggregated into a class distribution noised with Gaussian mechanism for DP, and public samples are selected to match this privatized distribution for domain alignment; (2) The diffusion model is pre-trained on this curated subset with adversarial model perturbation to foster robustness; (3) The model undergoes fine-tuning on private data using DP-SGD. This synergy of robustness augmentation and transferability optimization yields high-fidelity synthesis. Extensive evaluations on ImageNet for pre-training, with CelebA and CIFAR-10 for synthesis, show RPGen outperforming state-of-the-art baselines across epsilon in 1, 5, 10. On average, it achieves 20.18% lower FID and 5.45% higher classification accuracy. Ablations confirm the efficacy of domain curation and modest perturbations, establishing RPGen as a new benchmark for privacy-utility trade-offs in image generation.

AAAI Conference 2026 Conference Paper

Semi-supervised Latent Disentangled Diffusion Model for Textile Pattern Generation

  • Chenggong Hu
  • Yi Wang
  • Mengqi Xue
  • Haofei Zhang
  • Jie Song
  • Li Sun

Textile pattern generation (TPG) aims to synthesize fine-grained textile pattern images based on given clothing images. Although previous studies have not explicitly investigated TPG, existing image-to-image models appear to be natural candidates for this task. However, when applied directly, these methods often produce unfaithful results, failing to preserve fine-grained details due to feature confusion between complex textile patterns and the inherent non-rigid texture distortions in clothing images. In this paper, we propose a novel method, SLDDM-TPG, for faithful and high-fidelity TPG. Our method consists of two stages: (1) a latent disentangled network (LDN) that resolves feature confusion in clothing representations and constructs a multi-dimensional, independent clothing feature space; and (2) a semi-supervised latent diffusion model (S-LDM), which receives guidance signals from LDN and generates faithful results through semi-supervised diffusion training, combined with our designed fine-grained alignment strategy. Extensive evaluations show that SLDDM-TPG reduces FID by 4.1 and improves SSIM by up to 0.116 on our CTP-HD dataset, and also demonstrate good generalization on the VITON-HD dataset.

AAAI Conference 2026 Conference Paper

Structural Entropy Guided Incremental Learning for Open-World Multimodal Social Event Detection

  • Zhiwei Yang
  • Haimei Qin
  • Xiaoyan Yu
  • Hao Peng
  • Lei Jiang
  • Li Sun
  • Zhiqin Yang

With the explosive growth of multimodal data streams on social media, the timely detection of emerging social events has become increasingly important. As a result, Multimodal Social Event Detection in open-world settings is receiving growing attention. However, most existing methods face two major limitations: (1) They overlook the dynamic nature of open-world social media data and fail to design dedicated incremental learning frameworks. (2) They ignore the impact of noise in streaming data, leading to performance degradation over long-term detection. To overcome these limitations, we propose SeInEvent (**S**tructural **E**ntropy Guided **In**cremental Learning for Open-World Multimodal Social **Event** Detection). Our innovations are as follows: **First**, considering data dynamics, we design a self-supervised alternating incremental contrastive learning mechanism. Through knowledge distillation, historical event clusters were reviewed and consolidated, and contrastive learning was combined to absorb knowledge of unknown events, ultimately achieving incremental learning without labels. **Second**, addressing the impact of noise, we propose a Pointwise Structural Entropy-based noise filter, which quantifies each sample’s informational contribution to the event clustering structure. It enables automatic removal of noisy data and supports robust long-term detection. Extensive experiments on two public datasets demonstrate that SeInEvent achieves superior performance.

NeurIPS Conference 2025 Conference Paper

$\textit{HiMaCon:}$ Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data

  • Ruizhe Liu
  • Pei Zhou
  • Qian Luo
  • Li Sun
  • Jun CEN
  • Yibing Song
  • Yanchao Yang

Effective generalization in robotic manipulation requires representations that capture invariant patterns of interaction across environments and tasks. We present a self-supervised framework for learning hierarchical manipulation concepts that encode these invariant patterns through cross-modal sensory correlations and multi-level temporal abstractions without requiring human annotation. Our approach combines a cross-modal correlation network that identifies persistent patterns across sensory modalities with a multi-horizon predictor that organizes representations hierarchically across temporal scales. Manipulation concepts learned through this dual structure enable policies to focus on transferable relational patterns while maintaining awareness of both immediate actions and longer-term goals. Empirical evaluation across simulated benchmarks and real-world deployments demonstrates significant performance improvements with our concept-enhanced policies. Analysis reveals that the learned concepts resemble human-interpretable manipulation primitives despite receiving no semantic supervision. This work advances both the understanding of representation learning for manipulation and provides a practical approach to enhancing robotic performance in complex scenarios.

NeurIPS Conference 2025 Conference Paper

D$^2$GS: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction

  • Kejing Xia
  • Jidong Jia
  • Ke Jin
  • Yucai BAI
  • Li Sun
  • Dacheng Tao
  • Youjian Zhang

Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depend on multimodal sensors as inputs, $\textit{i. e. }$ LiDAR and images. Though the geometry prior provided by LiDAR point clouds can largely mitigate ill-posedness in reconstruction, acquiring such accurate LiDAR data is still challenging in practice: i) precise spatiotemporal calibration between LiDAR and other sensors is required, as they may not capture data simultaneously; ii) reprojection errors arise from spatial misalignment when LiDAR and cameras are mounted at different locations. To avoid the difficulty of acquiring accurate LiDAR depth, we propose ${D}^2GS$, a LiDAR-free urban scene reconstruction framework. In this work, we obtain geometry priors that are as effective as LiDAR while being denser and more accurate. $\textbf{First}$, we initialize a dense point cloud by back-projecting multi-view metric depth predictions. This point cloud is then optimized by a Progressive Pruning strategy to improve the global consistency. $\textbf{Second}$, we jointly refine Gaussian geometry and predicted dense metric depth via a Depth Enhancer. Specifically, we leverage diffusion priors from a depth foundation model to enhance the depth maps rendered by Gaussians. In turn, the enhanced depths provide stronger geometric constraints during Gaussian training. $\textbf{Finally}$, we improve the accuracy of ground geometry by constraining the shape and normal attributes of Gaussians within road regions. Extensive experiments on the Waymo dataset demonstrate that our method consistently outperforms state-of-the-art methods, producing more accurate geometry even when compared with those using ground-truth LiDAR data.

NeurIPS Conference 2025 Conference Paper

Deeper with Riemannian Geometry: Overcoming Oversmoothing and Oversquashing for Graph Foundation Models

  • Li Sun
  • Zhenhao Huang
  • Ming Zhang
  • Philip S Yu

Message Passing Neural Networks (MPNNs) are the building block of graph foundation models, but fundamentally suffer from oversmoothing and oversquashing. There has recently been a surge of interest in fixing both issues. Existing efforts primarily adopt global approaches, which may be beneficial in some regions but detrimental in others, ultimately leading to the suboptimal expressiveness. In this paper, we begin by revisiting oversquashing through a global measure -- spectral gap $\lambda$ -- and prove that the increase of $\lambda$ leads to gradient vanishing with respect to the input features, thereby undermining the effectiveness of message passing. Motivated by such theoretical insights, we propose a local approach that adaptively adjusts message passing based on local structures. To achieve this, we connect local Riemannian geometry with MPNNs, and establish a novel nonhomogeneous boundary condition to address both oversquashing and oversmoothing. Building on the Robin condition, we design a GBN network with local bottleneck adjustment, coupled with theoretical guarantees. Extensive experiments on homophilic and heterophilic graphs show the expressiveness of GBN. Furthermore, GBN does not exhibit performance degradation even when the network depth exceeds $256$ layers.

ICLR Conference 2025 Conference Paper

HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation

  • Hanxiang Ren
  • Li Sun
  • Xulong Wang
  • Pei Zhou
  • Zewen Wu
  • Siyan Dong
  • Difan Zou
  • Youyi Zheng

Policy learning through behavior cloning poses significant challenges, particularly when demonstration data is limited. In this work, we present HyPoGen, a novel optimization-biased hypernetwork for policy generation. The proposed hypernetwork learns to synthesize optimal policy parameters solely from task specifications -- without accessing training data -- by modeling policy generation as an approximation of the optimization process executed over a finite number of steps and assuming these specifications serve as a sufficient representation of the demonstration data. By incorporating structural designs that bias the hypernetwork towards optimization, we can improve its generalization capability while only training on source task demonstrations. During the feed-forward prediction pass, the hypernetwork effectively performs an optimization in the latent (compressed) policy space, which is then decoded into policy parameters for action prediction. Experimental results on locomotion and manipulation benchmarks show that HyPoGen significantly outperforms state-of-the-art methods in generating policies for unseen target tasks without any demonstrations, achieving higher success rates and underscoring the potential of optimization-biased hypernetworks in advancing generalizable policy generation. Our code and data are available at: https://github.com/ReNginx/HyPoGen.

NeurIPS Conference 2025 Conference Paper

LIFEBENCH: Evaluating Length Instruction Following in Large Language Models

  • Wei Zhang
  • Zhenhong Zhou
  • Kun Wang
  • Junfeng Fang
  • Rongwu Xu
  • Yuanhe Zhang
  • Rui Wang
  • Ge Zhang

While large language models (LLMs) can solve PhD-level reasoning problems over long context inputs, they still struggle with a seemingly simpler task: following explicit length instructions —e. g. , write a 10, 000-word novel. Additionally, models often generate far too short outputs, terminate prematurely, or even refuse the request. Existing benchmarks focus primarily on evaluating generations quality, but often overlook whether the generations meet length constraints. To this end, we introduce Length Instruction Following Evaluation Benchmark (LIFEBench) to comprehensively evaluate LLMs' ability to follow length instructions across diverse tasks and a wide range of specified lengths. LIFEBench consists of 10, 800 instances across 4 task categories in both English and Chinese, covering length constraints ranging from 16 to 8192 words. We evaluate 26 widely-used LLMs and find that most models reasonably follow short-length instructions but deteriorate sharply beyond a certain threshold. Surprisingly, almost all models fail to reach the vendor-claimed maximum output lengths in practice, as further confirmed by our evaluations extending up to 32K words. Even long-context LLMs, despite their extended input-output windows, counterintuitively fail to improve length-instructions following. Notably, Reasoning LLMs outperform even specialized long-text generation models, achieving state-of-the-art length following. Overall, LIFEBench uncovers fundamental limitations in current LLMs' length instructions following ability, offering critical insights for future progress.

AAAI Conference 2025 Conference Paper

Pioneer: Physics-informed Riemannian Graph ODE for Entropy-increasing Dynamics

  • Li Sun
  • Ziheng Zhang
  • Zixi Wang
  • Yujie Wang
  • Qiqi Wan
  • Hao Li
  • Hao Peng
  • Philip S. Yu

Dynamic interacting system modeling is important for understanding and simulating real world systems, e.g., meteorology and the spread of COVID. The system is typically described as a graph, where multiple objects dynamically interact with each other and evolve over time. In recent years, graph Ordinary Differential Equations (ODE) receive increasing research attentions. While achieving encouraging results, existing solutions prioritize the traditional Euclidean space, and neglect the intrinsic geometry of the system and physics laws, e.g., the principle of entropy increasing. The aforementioned limitations motivate us to rethink the system dynamics from a fresh perspective of Riemannian geometry, and pose a more realistic problem of physics-informed dynamic system modeling, considering the underlying geometry and physics law for the first time. In this paper, we present a novel physics-informed Riemannian graph ODE for a wide range of entropy-increasing dynamic systems (termed as Pioneer). In particular, we formulate a differential system on the Riemannian manifold, where a manifold-valued graph ODE is governed by the proposed constrained Ricci flow, and a manifold preserving Gyro-transform aware of system geometry. Theoretically, we report the provable entropy non-decreasing of our formulation, obeying the physics laws. Empirical results show the superiority of Pioneer on real datasets.

IROS Conference 2025 Conference Paper

SparseMeXt: Unlocking the Potential of Sparse Representations for HD Map Construction

  • Anqing Jiang
  • Jinhao Chai
  • Yu Gao 0042
  • Yiru Wang 0001
  • Yuwen Heng
  • Zhigang Sun 0001
  • Hao Sun
  • Zezhong Zhao

Recent advancements in high-definition (HD) map construction have demonstrated the effectiveness of dense representations, which heavily rely on computationally intensive bird’s-eye view (BEV) features. While sparse representations offer a more efficient alternative by avoiding dense BEV processing, existing methods often lag behind due to the lack of tailored designs. These limitations have hindered the competitiveness of sparse representations in online HD map construction. In this work, we systematically revisit and enhance sparse representation techniques, identifying key architectural and algorithmic improvements that bridge the gap with—and ultimately surpass—dense approaches. We introduce a dedicated network architecture optimized for sparse map feature extraction, a sparse-dense segmentation auxiliary task to better leverage geometric and semantic cues, and a denoising module guided by physical priors to refine predictions. Through these enhancements, our method achieves state-of-the-art performance on the nuScenes dataset, significantly advancing HD map construction and centerline detection. Specifically, SparseMeXt-Tiny reaches a mean average precision (mAP) of 55. 5% at 32 frames per second (fps), while SparseMeXt-Base attains 65. 2% mAP. Scaling the backbone and decoder further, SparseMeXt-Large achieves an mAP of 68. 9% at over 20 fps, establishing a new benchmark for sparse representations in HD map construction. These results underscore the untapped potential of sparse methods, challenging the conventional reliance on dense representations and redefining efficiency-performance trade-offs in the field.

AAAI Conference 2025 Conference Paper

Structural Entropy Guided Probabilistic Coding

  • Xiang Huang
  • Hao Peng
  • Li Sun
  • Hui Lin
  • Chunyang Liu
  • Jiang Cao
  • Philip S. Yu

Probabilistic embeddings have several advantages over deterministic embeddings as they map each data point to a distribution, which better describes the uncertainty and complexity of data. Many works focus on adjusting the distribution constraint under the Information Bottleneck (IB) principle to enhance representation learning. However, these proposed regularization terms only consider the constraint of each latent variable, omitting the structural information between latent variables. In this paper, we propose a novel structural entropy-guided probabilistic coding model, named SEPC. Specifically, we incorporate the relationship between latent variables into the optimization by proposing a structural entropy regularization loss. Besides, as traditional structural information theory is not well-suited for regression tasks, we propose a probabilistic encoding tree, transferring regression tasks to classification tasks while diminishing the influence of the transformation. Experimental results across 12 natural language understanding tasks, including both classification and regression tasks, demonstrate the superior performance of SEPC compared to other state-of-the-art models in terms of effectiveness, generalization capability, and robustness to label noise.

AAAI Conference 2025 Conference Paper

Towards Effective, Efficient and Unsupervised Social Event Detection in the Hyperbolic Space

  • Xiaoyan Yu
  • Yifan Wei
  • Shuaishuai Zhou
  • Zhiwei Yang
  • Li Sun
  • Hao Peng
  • Liehuang Zhu
  • Philip S. Yu

The vast, complex, and dynamic nature of social message data has posed challenges to social event detection (SED). Despite considerable effort, these challenges persist, often resulting in inadequately expressive message representations (ineffective) and prolonged learning durations (inefficient). In response to the challenges, this work introduces an unsupervised framework, HyperSED (Hyperbolic SED). Specifically, the proposed framework first models social messages into semantic-based message anchors, and then leverages the structure of the anchor graph and the expressiveness of the hyperbolic space to acquire structure- and geometry-aware anchor representations. Finally, HyperSED builds the partitioning tree of the anchor message graph by incorporating differentiable structural information as the reflection of the detected events. Extensive experiments on public datasets demonstrate HyperSED's competitive performance, along with a substantial improvement in efficiency compared to the current state-of-the-art unsupervised paradigm. Statistically, HyperSED boosts incremental SED by an average of 2%, 2%, and 25% in NMI, AMI, and ARI, respectively; enhancing efficiency by up to 37.41 times and at least 12.10 times, illustrating the advancement of the proposed framework.

IJCAI Conference 2025 Conference Paper

Trace: Structural Riemannian Bridge Matching for Transferable Source Localization in Information Propagation

  • Li Sun
  • Suyang Zhou
  • Bowen Fang
  • Hechuan Zhang
  • Junda Ye
  • Yutong Ye
  • Philip S. Yu

Source localization, the inverse problem of information diffusion, shows fundamental importance for understanding social dynamics. While achieving notable progress, existing solutions are typically exposed to the risk of error accumulation, and require a large number of observations for effective inference. However, it is often impractical to obtain quantities of observations in real scenarios, highlighting the need for a transferable model with broad applicability. Recently, Riemannian geometry has demonstrated its effectiveness in information diffusion and offers guidance in knowledge transfer, but has yet to be explored in source localization. In light of the issues above, we propose to study transferable source localization from a fresh geometric perspective, and present a novel approach (Trace) on the Riemannian manifold. Concretely, we establish a structural Schrodinger bridge to directly model the map between source and final distributions, where a functional curvature, encapsulating the graph structure, is formulated to govern the Schrodinger bridge and facilitate domain adaptation. Furthermore, we design a simple yet effective learning algorithm for Riemannian Schrodinger bridges (geodesics bridge matching) in which we prove the optimal projection holds for Riemannian measure so that the expensive iterative procedure is avoided. Extensive experiments demonstrate the effectiveness and transferability of Trace on both synthetic and real datasets.

AAAI Conference 2024 Conference Paper

Angle Robustness Unmanned Aerial Vehicle Navigation in GNSS-Denied Scenarios

  • Yuxin Wang
  • Zunlei Feng
  • Haofei Zhang
  • Yang Gao
  • Jie Lei
  • Li Sun
  • Mingli Song

Due to the inability to receive signals from the Global Navigation Satellite System (GNSS) in extreme conditions, achieving accurate and robust navigation for Unmanned Aerial Vehicles (UAVs) is a challenging task. Recently emerged, vision-based navigation has been a promising and feasible alternative to GNSS-based navigation. However, existing vision-based techniques are inadequate in addressing flight deviation caused by environmental disturbances and inaccurate position predictions in practical settings. In this paper, we present a novel angle robustness navigation paradigm to deal with flight deviation in point-to-point navigation tasks. Additionally, we propose a model that includes the Adaptive Feature Enhance Module, Cross-knowledge Attention-guided Module and Robust Task-oriented Head Module to accurately predict direction angles for high-precision navigation. To evaluate the vision-based navigation methods, we collect a new dataset termed as UAV_AR368. Furthermore, we design the Simulation Flight Testing Instrument (SFTI) using Google Earth to simulate different flight environments, thereby reducing the expenses associated with real flight testing. Experiment results demonstrate that the proposed model outperforms the state-of-the-art by achieving improvements of 26.0% and 45.6% in the success rate of arrival under ideal and disturbed circumstances, respectively.

AAAI Conference 2024 Conference Paper

Motif-Aware Riemannian Graph Neural Network with Generative-Contrastive Learning

  • Li Sun
  • Zhenhao Huang
  • Zixi Wang
  • Feiyang Wang
  • Hao Peng
  • Philip S. Yu

Graphs are typical non-Euclidean data of complex structures. In recent years, Riemannian graph representation learning has emerged as an exciting alternative to Euclidean ones. However, Riemannian methods are still in an early stage: most of them present a single curvature (radius) regardless of structural complexity, suffer from numerical instability due to the exponential/logarithmic map, and lack the ability to capture motif regularity. In light of the issues above, we propose the problem of Motif-aware Riemannian Graph Representation Learning, seeking a numerically stable encoder to capture motif regularity in a diverse-curvature manifold without labels. To this end, we present a novel Motif-aware Riemannian model with Generative-Contrastive learning (MotifRGC), which conducts a minmax game in Riemannian manifold in a self-supervised manner. First, we propose a new type of Riemannian GCN (D-GCN), in which we construct a diverse-curvature manifold by a product layer with the diversified factor, and replace the exponential/logarithmic map by a stable kernel layer. Second, we introduce a motif-aware Riemannian generative-contrastive learning to capture motif regularity in the constructed manifold and learn motif-aware node representation without external labels. Empirical results show the superiority of MofitRGC.

IJCAI Conference 2024 Conference Paper

OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer

  • Shengjian Wu
  • Li Sun
  • Qingli Li

DEtection TRansformer (DETR) becomes a dominant paradigm, mainly due to its common architecture with high accuracy and no post-processing. However, DETR suffers from unstable training dynamics. It consumes more data and epochs to converge compared with CNN-based detectors. This paper aims to stabilize DETR training through the online distillation. It utilizes a teacher model, accumulated by Exponential Moving Average (EMA), and distills its knowledge into the online model in following three aspects. First, the matching relation between object queries and ground truth (GT) boxes in the teacher is employed to guide the student, so queries within the student are not only assigned labels based on their own predictions, but also refer to the matching results from the teacher. Second, the teacher's initial query is given to the online student, and its prediction is directly constrained by the corresponding output from the teacher. Finally, the object queries from teacher's different decoding stages are used to build the auxiliary groups to accelerate the convergence. For each GT, two queries with the least matching costs are selected into this extra group, and they predict the GT box and participate the optimization. Extensive experiments show that the proposed OD-DETR successfully stabilizes the training, and significantly increases the performance without bringing in more parameters.

NeurIPS Conference 2024 Conference Paper

Spiking Graph Neural Network on Riemannian Manifolds

  • Li Sun
  • Zhenhao Huang
  • Qiqi Wan
  • Hao Peng
  • Philip S. Yu

Graph neural networks (GNNs) have become the dominant solution for learning on graphs, the typical non-Euclidean structures. Conventional GNNs, constructed with the Artificial Neuron Network (ANN), have achieved impressive performance at the cost of high computation and energy consumption. In parallel, spiking GNNs with brain-like spiking neurons are drawing increasing research attention owing to the energy efficiency. So far, existing spiking GNNs consider graphs in Euclidean space, ignoring the structural geometry, and suffer from the high latency issue due to Back-Propagation-Through-Time (BPTT) with the surrogate gradient. In light of the aforementioned issues, we are devoted to exploring spiking GNN on Riemannian manifolds, and present a Manifold-valued Spiking GNN (MSG). In particular, we design a new spiking neuron on geodesically complete manifolds with the diffeomorphism, so that BPTT regarding the spikes is replaced by the proposed differentiation via manifold. Theoretically, we show that MSG approximates a solver of the manifold ordinary differential equation. Extensive experiments on common graphs show the proposed MSG achieves superior performance to previous spiking GNNs and energy efficiency to conventional GNNs.

AAAI Conference 2023 Conference Paper

CLIP-ReID: Exploiting Vision-Language Model for Image Re-identification without Concrete Text Labels

  • Siyuan Li
  • Li Sun
  • Qingli Li

Pre-trained vision-language models like CLIP have recently shown superior performances on various downstream tasks, including image classification and segmentation. However, in fine-grained image re-identification (ReID), the labels are indexes, lacking concrete text descriptions. Therefore, it remains to be determined how such models could be applied to these tasks. This paper first finds out that simply fine-tuning the visual model initialized by the image encoder in CLIP, has already obtained competitive performances in various ReID tasks. Then we propose a two-stage strategy to facilitate a better visual representation. The key idea is to fully exploit the cross-modal description ability in CLIP through a set of learnable text tokens for each ID and give them to the text encoder to form ambiguous descriptions. In the first training stage, image and text encoders from CLIP keep fixed, and only the text tokens are optimized from scratch by the contrastive loss computed within a batch. In the second stage, the ID-specific text tokens and their encoder become static, providing constraints for fine-tuning the image encoder. With the help of the designed loss in the downstream task, the image encoder is able to represent data as vectors in the feature embedding accurately. The effectiveness of the proposed strategy is validated on several datasets for the person or vehicle ReID tasks. Code is available at https://github.com/Syliz517/CLIP-ReID.

IJCAI Conference 2023 Conference Paper

CONGREGATE: Contrastive Graph Clustering in Curvature Spaces

  • Li Sun
  • Feiyang Wang
  • Junda Ye
  • Hao Peng
  • Philip S. Yu

Graph clustering is a longstanding research topic, and has achieved remarkable success with the deep learning methods in recent years. Nevertheless, we observe that several important issues largely remain open. On the one hand, graph clustering from the geometric perspective is appealing but has rarely been touched before, as it lacks a promising space for geometric clustering. On the other hand, contrastive learning boosts the deep graph clustering but usually struggles in either graph augmentation or hard sample mining. To bridge this gap, we rethink the problem of graph clustering from geometric perspective and, to the best of our knowledge, make the first attempt to introduce a heterogeneous curvature space to graph clustering problem. Correspondingly, we present a novel end-to-end contrastive graph clustering model named CONGREGATE, addressing geometric graph clustering with Ricci curvatures. To support geometric clustering, we construct a theoretically grounded Heterogeneous Curvature Space where deep representations are generated via the product of the proposed fully Riemannian graph convolutional nets. Thereafter, we train the graph clusters by an augmentation-free reweighted contrastive approach where we pay more attention to both hard negatives and hard positives in our curvature space. Empirical results on real-world graphs show that our model outperforms the state-of-the-art competitors.

NeurIPS Conference 2023 Conference Paper

Global-correlated 3D-decoupling Transformer for Clothed Avatar Reconstruction

  • Zechuan Zhang
  • Li Sun
  • Zongxin Yang
  • Ling Chen
  • Yi Yang

Reconstructing 3D clothed human avatars from single images is a challenging task, especially when encountering complex poses and loose clothing. Current methods exhibit limitations in performance, largely attributable to their dependence on insufficient 2D image features and inconsistent query methods. Owing to this, we present the Global-correlated 3D-decoupling Transformer for clothed Avatar reconstruction (GTA), a novel transformer-based architecture that reconstructs clothed human avatars from monocular images. Our approach leverages transformer architectures by utilizing a Vision Transformer model as an encoder for capturing global-correlated image features. Subsequently, our innovative 3D-decoupling decoder employs cross-attention to decouple tri-plane features, using learnable embeddings as queries for cross-plane generation. To effectively enhance feature fusion with the tri-plane 3D feature and human body prior, we propose a hybrid prior fusion strategy combining spatial and prior-enhanced queries, leveraging the benefits of spatial localization and human body prior knowledge. Comprehensive experiments on CAPE and THuman2. 0 datasets illustrate that our method outperforms state-of-the-art approaches in both geometry and texture reconstruction, exhibiting high robustness to challenging poses and loose clothing, and producing higher-resolution textures. Codes are available at https: //github. com/River-Zhang/GTA.

TIST Journal 2023 Journal Article

MC 2: Unsupervised Multiple Social Network Alignment

  • Li Sun
  • Zhongbao Zhang
  • Gen Li
  • Pengxin Ji
  • Sen Su
  • Philip S. Yu

Social network alignment, identifying social accounts of the same individual across different social networks, shows fundamental importance in a wide spectrum of applications, such as link prediction and information diffusion. Individuals more often than not join in multiple social networks, and it is in fact much too expensive or even impossible to acquiring supervision for guiding the alignment. To the best of our knowledge, few method in the literature can align multiple social networks without supervision. In this article, we propose to study the problem of unsupervised multiple social network alignment. To address this problem, we propose a novel unsupervised model of joint Matrix factorization with a diagonal Cone under orthogonal Constraint, referred to as MC 2. Its core idea is to embed and align multiple social networks in the common subspace via an unsupervised approach. Specifically, in MC 2 model, we first design a matrix optimization to infer the common subspace from different social networks. To address the nonconvex optimization, we then design an efficient alternating algorithm by leveraging its inherent functional property. Through extensive experiments on real-world datasets, we demonstrate that the proposed MC 2 model significantly outperforms the state-of-the-art methods.

AAAI Conference 2023 Conference Paper

Self-Supervised Continual Graph Learning in Adaptive Riemannian Spaces

  • Li Sun
  • Junda Ye
  • Hao Peng
  • Feiyang Wang
  • Philip S. Yu

Continual graph learning routinely finds its role in a variety of real-world applications where the graph data with different tasks come sequentially. Despite the success of prior works, it still faces great challenges. On the one hand, existing methods work with the zero-curvature Euclidean space, and largely ignore the fact that curvature varies over the com- ing graph sequence. On the other hand, continual learners in the literature rely on abundant labels, but labeling graph in practice is particularly hard especially for the continuously emerging graphs on-the-fly. To address the aforementioned challenges, we propose to explore a challenging yet practical problem, the self-supervised continual graph learning in adaptive Riemannian spaces. In this paper, we propose a novel self-supervised Riemannian Graph Continual Learner (RieGrace). In RieGrace, we first design an Adaptive Riemannian GCN (AdaRGCN), a unified GCN coupled with a neural curvature adapter, so that Riemannian space is shaped by the learnt curvature adaptive to each graph. Then, we present a Label-free Lorentz Distillation approach, in which we create teacher-student AdaRGCN for the graph sequence. The student successively performs intra-distillation from itself and inter-distillation from the teacher so as to consolidate knowledge without catastrophic forgetting. In particular, we propose a theoretically grounded Generalized Lorentz Projection for the contrastive distillation in Riemannian space. Extensive experiments on the benchmark datasets show the superiority of RieGrace, and additionally, we investigate on how curvature changes over the graph sequence.

AAAI Conference 2022 Conference Paper

A Self-Supervised Mixed-Curvature Graph Neural Network

  • Li Sun
  • Zhongbao Zhang
  • Junda Ye
  • Hao Peng
  • Jiawei Zhang
  • Sen Su
  • Philip S Yu

Graph representation learning received increasing attentions in recent years. Most of the existing methods ignore the complexity of the graph structures and restrict graphs in a single constant-curvature representation space, which is only suitable to particular kinds of graph structure indeed. Additionally, these methods follow the supervised or semi-supervised learning paradigm, and thereby notably limit their deployment on the unlabeled graphs in real applications. To address these aforementioned limitations, we take the first attempt to study the self-supervised graph representation learning in the mixed-curvature spaces. In this paper, we present a novel Self-Supervised Mixed-Curvature Graph Neural Network (SELFMGNN). To capture the complex graph structures, we construct a mixed-curvature space via the Cartesian product of multiple Riemannian component spaces, and design hierarchical attention mechanisms for learning and fusing graph representations across these component spaces. To enable the self-supervised learning, we propose a novel dual contrastive approach. The constructed mixed-curvature space actually provides multiple Riemannian views for the contrastive learning. We introduce a Riemannian projector to reveal these views, and utilize a well-designed Riemannian discriminator for the single-view and cross-view contrastive learning within and across the Riemannian views. Finally, extensive experiments show that SELFMGNN captures the complex graph structures and outperforms state-of-the-art baselines.

JBHI Journal 2022 Journal Article

Hierarchical Amortized GAN for 3D High Resolution Medical Image Synthesis

  • Li Sun
  • Junxiang Chen
  • Yanwu Xu
  • Mingming Gong
  • Ke Yu
  • Kayhan Batmanghelich

Generative Adversarial Networks (GAN) have many potential medical imaging applications, including data augmentation, domain adaptation, and model explanation. Due to the limited memory of Graphical Processing Units (GPUs), most current 3D GAN models are trained on low-resolution medical images, these models either cannot scale to high-resolution or are prone to patchy artifacts. In this work, we propose a novel end-to-end GAN architecture that can generate high-resolution 3D images. We achieve this goal by using different configurations between training and inference. During training, we adopt a hierarchical structure that simultaneously generates a low-resolution version of the image and a randomly selected sub-volume of the high-resolution image. The hierarchical design has two advantages: First, the memory demand for training on high-resolution images is amortized among sub-volumes. Furthermore, anchoring the high-resolution sub-volumes to a single low-resolution image ensures anatomical consistency between sub-volumes. During inference, our model can directly generate full high-resolution images. We also incorporate an encoder with a similar hierarchical structure into the model to extract features from the images. Experiments on 3D thorax CT and brain MRI demonstrate that our approach outperforms state of the art in image generation. We also demonstrate clinical applications of the proposed model in data augmentation and clinical-relevant feature extraction.

AAAI Conference 2022 Conference Paper

Knowledge Distillation via Constrained Variational Inference

  • Ardavan Saeedi
  • Yuria Utsumi
  • Li Sun
  • Kayhan Batmanghelich
  • Li-wei Lehman

Knowledge distillation has been used to capture the knowledge of a teacher model and distill it into a student model with some desirable characteristics such as being smaller, more efficient, or more generalizable. In this paper, we propose a framework for distilling the knowledge of a powerful discriminative model such as a neural network into commonly used graphical models known to be more interpretable (e. g. , topic models, autoregressive Hidden Markov Models). Posterior of latent variables in these graphical models (e. g. , topic proportions in topic models) is often used as feature representation for predictive tasks. However, these posterior-derived features are known to have poor predictive performance compared to the features learned via purely discriminative approaches. Our framework constrains variational inference for posterior variables in graphical models with a similarity preserving constraint. This constraint distills the knowledge of the discriminative model into the graphical model by ensuring that input pairs with (dis)similar representation in the teacher model also have (dis)similar representation in the student model. By adding this constraint to the variational inference scheme, we guide the graphical model to be a reasonable density model for the data while having predictive features which are as close as possible to those of a discriminative model. To make our framework applicable to a wide range of graphical models, we build upon the Automatic Differentiation Variational Inference (ADVI), a black-box inference framework for graphical models. We demonstrate the effectiveness of our framework on two real-world tasks of disease subtyping and disease trajectory modeling.

NeurIPS Conference 2021 Conference Paper

Can contrastive learning avoid shortcut solutions?

  • Joshua Robinson
  • Li Sun
  • Ke Yu
  • Kayhan Batmanghelich
  • Stefanie Jegelka
  • Suvrit Sra

The generalization of representations learned via contrastive learning depends crucially on what features of the data are extracted. However, we observe that the contrastive loss does not always sufficiently guide which features are extracted, a behavior that can negatively impact the performance on downstream tasks via “shortcuts", i. e. , by inadvertently suppressing important predictive features. We find that feature extraction is influenced by the difficulty of the so-called instance discrimination task (i. e. , the task of discriminating pairs of similar points from pairs of dissimilar ones). Although harder pairs improve the representation of some features, the improvement comes at the cost of suppressing previously well represented features. In response, we propose implicit feature modification (IFM), a method for altering positive and negative samples in order to guide contrastive models towards capturing a wider variety of predictive features. Empirically, we observe that IFM reduces feature suppression, and as a result improves performance on vision and medical imaging tasks.

AAAI Conference 2021 Conference Paper

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

  • Li Sun
  • Ke Yu
  • Kayhan Batmanghelich

Supervised learning method requires a large volume of annotated datasets. Collecting such datasets is time-consuming and expensive. Until now, very few annotated COVID- 19 imaging datasets are available. Although self-supervised learning enables us to bootstrap the training by exploiting unlabeled data, the generic self-supervised methods for natural images do not sufficiently incorporate the context. For medical images, a desirable method should be sensitive enough to detect deviation from normal-appearing tissue of each anatomical region; here, anatomy is the context. We introduce a novel approach with two levels of self-supervised representation learning objectives: one on the regional anatomical level and another on the patient-level. We use graph neural networks to incorporate the relationship between different anatomical regions. The structure of the graph is informed by anatomical correspondences between each patient and an anatomical atlas. In addition, the graph representation has the advantage of handling any arbitrarily sized image in full resolution. Experiments on large-scale Computer Tomography (CT) datasets of lung images show that our approach compares favorably to baseline methods that do not account for the context. We use the learnt embedding to quantify the clinical progression of COVID-19 and show that our method generalizes well to COVID-19 patients from different hospitals. Qualitative results suggest that our model can identify clinically relevant regions in the images.

AAAI Conference 2021 Conference Paper

Hyperbolic Variational Graph Neural Network for Modeling Dynamic Graphs

  • Li Sun
  • Zhongbao Zhang
  • Jiawei Zhang
  • Feiyang Wang
  • Hao Peng
  • Sen Su
  • Philip S. Yu

Learning representations for graphs plays a critical role in a wide spectrum of downstream applications. In this paper, we summarize the limitations of the prior works in three folds: representation space, modeling dynamics and modeling uncertainty. To bridge this gap, we propose to learn dynamic graph representation in hyperbolic space, for the first time, which aims to infer stochastic node representations. Working with hyperbolic space, we present a novel Hyperbolic Variational Graph Neural Network, referred to as HVGNN. In particular, to model the dynamics, we introduce a Temporal GNN (TGNN) based on a theoretically grounded time encoding approach. To model the uncertainty, we devise a hyperbolic graph variational autoencoder built upon the proposed TGNN to generate stochastic node representations of hyperbolic normal distributions. Furthermore, we introduce a reparameterisable sampling algorithm for the hyperbolic normal distribution to enable the gradient-based learning of HVGNN. Extensive experiments show that HVGNN outperforms stateof-the-art baselines on real-world datasets.

IJCAI Conference 2021 Conference Paper

Online Credit Payment Fraud Detection via Structure-Aware Hierarchical Recurrent Neural Network

  • Wangli Lin
  • Li Sun
  • Qiwei Zhong
  • Can Liu
  • Jinghua Feng
  • Xiang Ao
  • Hao Yang

Online credit payment fraud detection plays a critical role in financial institutions due to the growing volume of fraudulent transactions. Recently, researchers have shown an increased interest in capturing users’ dynamic and evolving fraudulent tendencies from their behavior sequences. However, most existing methodologies for sequential modeling overlook the intrinsic structure information of web pages. In this paper, we adopt multi-scale behavior sequence generated from different granularities of web page structures and propose a model named SAH-RNN to consume the multi-scale behavior sequence for online payment fraud detection. The SAH-RNN has stacked RNN layers in which upper layers modeling for compendious behaviors are updated less frequently and receive the summarized representations from lower layers. A dual attention is devised to capture the impacts on both sequential information within the same sequence and structural information among different granularity of web pages. Experimental results on a large-scale real-world transaction dataset from Alibaba show that our proposed model outperforms state-of-the-art models. The code is available at https: //github. com/WangliLin/SAH-RNN.

IJCAI Conference 2020 Conference Paper

BANANA: when Behavior ANAlysis meets social Network Alignment

  • Fuxin Ren
  • Zhongbao Zhang
  • Jiawei Zhang
  • Sen Su
  • Li Sun
  • Guozhen Zhu
  • Congying Guo

Recently, aligning users among different social networks has received significant attention. However, most of the existing studies do not consider users’ behavior information during the aligning procedure and thus still suffer from the poor learning performance. In fact, we observe that social network alignment and behavior analysis can benefit from each other. Motivated by such an observation, we propose to jointly study the social network alignment problem and user behavior analysis problem. We design a novel end-to-end framework named BANANA. In this framework, to leverage behavior analysis for social network alignment at the distribution level, we design an earth mover’s distance based alignment model to fuse users’ behavior information for more comprehensive user representations. To further leverage social network alignment for behavior analysis, in turn, we design a temporal graph neural network model to fuse behavior information in different social networks based on the alignment result. Two models above can work together in an end-to-end manner. Through extensive experiments on real-world datasets, we demonstrate that our proposed approach outperforms the state-of-the-art methods in the social network alignment task and the user behavior analysis task, respectively.

AAAI Conference 2019 Conference Paper

Amalgamating Knowledge towards Comprehensive Classification

  • Chengchao Shen
  • Xinchao Wang
  • Jie Song
  • Li Sun
  • Mingli Song

With the rapid development of deep learning, there have been an unprecedentedly large number of trained deep network models available online. Reusing such trained models can significantly reduce the cost of training the new models from scratch, if not infeasible at all as the annotations used for the training original networks are often unavailable to public. We propose in this paper to study a new model-reusing task, which we term as knowledge amalgamation. Given multiple trained teacher networks, each of which specializes in a different classification problem, the goal of knowledge amalgamation is to learn a lightweight student model capable of handling the comprehensive classification. We assume no other annotations except the outputs from the teacher models are available, and thus focus on extracting and amalgamating knowledge from the multiple teachers. To this end, we propose a pilot two-step strategy to tackle the knowledge amalgamation task, by learning first the compact feature representations from teachers and then the network parameters in a layer-wise manner so as to build the student model. We apply this approach to four public datasets and obtain very encouraging results: even without any human annotation, the obtained student model is competent to handle the comprehensive classification task and in most cases outperforms the teachers in individual sub-tasks.

IJCAI Conference 2018 Conference Paper

MASTER: across Multiple social networks, integrate Attribute and STructure Embedding for Reconciliation

  • Sen Su
  • Li Sun
  • Zhongbao Zhang
  • Gen Li
  • Jielun Qu

Recently, reconciling social networks receives significant attention. Most of the existing studies have limitations in the following three aspects: multiplicity, comprehensiveness and robustness. To address these three limitations, we rethink this problem and propose the MASTER framework, i. e. , across Multiple social networks, integrate Attribute and STructure Embedding for Reconciliation. In this framework, we first design a novel Constrained Dual Embedding model by simultaneously embedding and reconciling multiple social networks to formulate our problem into a unified optimization. To address this optimization, we then design an effective algorithm called NS-Alternating. We also prove that this algorithm converges to KKT points. Through extensive experiments on real-world datasets, we demonstrate that MASTER outperforms the state-of-the-art approaches.