Author name cluster

Nian Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

AAAI Conference 2026 Conference Paper

AURORA: Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation

Ziyang Luo
Nian Liu
Fahad Shahbaz Khan
Junwei Han

Reference Audio-Visual Segmentation (Ref-AVS) tasks challenge models to precisely locate sounding objects by integrating visual, auditory, and textual cues. Existing methods often lack genuine semantic understanding, tending to memorize fixed reasoning patterns. Furthermore, jointly training for reasoning and segmentation can compromise pixel-level precision. To address these issues, we introduce AURORA, a novel framework designed to enhance genuine reasoning and language comprehension in reference audio-visual segmentation. We employ a structured Chain-of-Thought (CoT) prompting mechanism to guide the model through a step-by-step reasoning process and introduce a novel segmentation feature distillation loss to effectively integrate these reasoning abilities without sacrificing segmentation performance. To further cultivate the model's genuine reasoning capabilities, we devise a further two-stage training strategy: first, a ``corrective reflective-style training" stage utilizes self-correction to enhance the quality of reasoning paths, followed by reinforcement learning via Group Reward Policy Optimization (GRPO) to bolster robustness in challenging scenarios. Experiments demonstrate that AURORA achieves state-of-the-art performance on Ref-AVS benchmarks and generalizes effectively to unreferenced segmentation.

PDF Details DOI

AAAI Conference 2026 Conference Paper

Video Echoed in Music: Semantic, Temporal, and Rhythmic Alignment for Video-to-Music Generation

Xinyi Tong
Yiran Zhu
Jishang Chen
Chunru Zhan
Tianle Wang
Sirui Zhang
Nian Liu
Tiezheng Ge

Video-to-Music generation seeks to generate musically appropriate background music that enhances audiovisual immersion for videos. However, current approaches suffer from two critical limitations: 1) incomplete representation of video details, leading to weak alignment, and 2) inadequate temporal and rhythmic correspondence, particularly in achieving precise beat synchronization. To address the challenges, we propose Video Echoed in Music (VeM), a latent music diffusion that generates high-quality soundtracks with semantic, temporal, and rhythmic alignment for input videos. To capture video details comprehensively, VeM employs a hierarchical video parsing that acts as a music conductor, orchestrating multi-level information across modalities. Modality-specific encoders, coupled with a storyboard-guided cross-attention mechanism (SG-CAtt), integrate semantic cues while maintaining temporal coherence through position and duration encoding. For rhythmic precision, the frame-level transition-beat aligner and adapter (TB-As) dynamically synchronize visual scene transitions with music beats. We further contribute a novel video-music paired dataset sourced from e-commerce advertisements and video-sharing platforms, which imposes stricter transition-beat synchronization requirements. Meanwhile, we introduce novel metrics tailored to the task. Experimental results demonstrate superiority, particularly in semantic relevance and rhythmic precision.

PDF Details DOI

EAAI Journal 2025 Journal Article

Backtracing Byzantine attacks in distributed average consensus networks: A gated graph neural network approach with graph reconstruction

Xinliang Wang
Shaolin Tan
Ye Tao
Nian Liu
Bing Li
Suixiang Gao

Byzantine attackers disguise themselves as normal nodes yet propagate incorrect information in the network to disrupt the function of distributed systems. Current Byzantine detecting approaches commonly rely on the delivered data among agents, where an agent is supposed to be a Byzantine attacker if its delivered data significantly differs with others. In this paper, for the first time we suppose that the transmission data within the network is not known due to privacy consideration and propose the problem of backtracing Byzantine attackers through observed states. To address this problem, we adopt the idea of backtracing patient in disease spreading networks and propose a gated graph neural network with graph reconstruction to effectively localize the Byzantine attackers in multi-agent networks. This approach first develops a data-driven method to learn the networking structure of the multi-agent systems and then builds a gated graph neural network based on the reconstructed graph to classify the nodes into normal ones and Byzantine ones. Compared with previous Byzantine detecting methods, the proposed approach is fully driven by the measured output state of each agent and localizes the attacker from external perspective. Extensive experimental results are conducted to confirm that the proposed Byzantine attack detector performs well across various testing parameters.

Details DOI

ICML Conference 2025 Conference Paper

Unsupervised Learning for Class Distribution Mismatch

Pan Du 0002
Wangbo Zhao
Xinai Lu
Nian Liu
Zhikai Li
Chaoyu Gong
Suyun Zhao
Hong Chen 0001

Class distribution mismatch (CDM) refers to the discrepancy between class distributions in training data and target tasks. Previous methods address this by designing classifiers to categorize classes known during training, while grouping unknown or new classes into an "other" category. However, they focus on semi-supervised scenarios and heavily rely on labeled data, limiting their applicability and performance. To address this, we propose Unsupervised Learning for Class Distribution Mismatch (UCDM), which constructs positive-negative pairs from unlabeled data for classifier training. Our approach randomly samples images and uses a diffusion model to add or erase semantic classes, synthesizing diverse training pairs. Additionally, we introduce a confidence-based labeling mechanism that iteratively assigns pseudo-labels to valuable real-world data and incorporates them into the training process. Extensive experiments on three datasets demonstrate UCDM’s superiority over previous semi-supervised methods. Specifically, with a 60% mismatch proportion on Tiny-ImageNet dataset, our approach, without relying on labeled data, surpasses OpenMatch (with 40 labels per class) by 35. 1%, 63. 7%, and 72. 5% in classifying known, unknown, and new classes.

Details

AAAI Conference 2024 Conference Paper

TransGOP: Transformer-Based Gaze Object Prediction

Binglu Wang
Chenxi Guo
Yang Jin
Haisheng Xia
Nian Liu

Gaze object prediction aims to predict the location and category of the object that is watched by a human. Previous gaze object prediction works use CNN-based object detectors to predict the object's location. However, we find that Transformer-based object detectors can predict more accurate object location for dense objects in retail scenarios. Moreover, the long-distance modeling capability of the Transformer can help to build relationships between the human head and the gaze object, which is important for the GOP task. To this end, this paper introduces Transformer into the fields of gaze object prediction and proposes an end-to-end Transformer-based gaze object prediction method named TransGOP. Specifically, TransGOP uses an off-the-shelf Transformer-based object detector to detect the location of objects and designs a Transformer-based gaze autoencoder in the gaze regressor to establish long-distance gaze relationships. Moreover, to improve gaze heatmap regression, we propose an object-to-gaze cross-attention mechanism to let the queries of the gaze autoencoder learn the global-memory position knowledge from the object detector. Finally, to make the whole framework end-to-end trained, we propose a Gaze Box loss to jointly optimize the object detector and gaze regressor by enhancing the gaze heatmap energy in the box of the gaze object. Extensive experiments on the GOO-Synth and GOO-Real datasets demonstrate that our TransGOP achieves state-of-the-art performance on all tracks, i.e., object detection, gaze estimation, and gaze object prediction. Our code will be available at https://github.com/chenxi-Guo/TransGOP.git.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Uncovering the Redundancy in Graph Self-supervised Learning Models

Zhibiao Wang
Xiao Wang
Haoyue Deng
Nian Liu
Shirui Pan
Chunming Hu

Graph self-supervised learning, as a powerful pre-training paradigm for Graph Neural Networks (GNNs) without labels, has received considerable attention. We have witnessed the success of graph self-supervised learning on pre-training the parameters of GNNs, leading many not to doubt that whether the learned GNNs parameters are all useful. In this paper, by presenting the experimental evidence and analysis, we surprisingly discover that the graph self-supervised learning models are highly redundant at both of neuron and layer levels, e. g. , even randomly removing 51. 6\% of parameters, the performance of graph self-supervised learning models still retains at least 96. 2\%. This discovery implies that the parameters of graph self-supervised models can be largely reduced, making simultaneously fine-tuning both graph self-supervised learning models and prediction layers more feasible. Therefore, we further design a novel graph pre-training and fine-tuning paradigm called SLImming DE-correlation Fine-tuning (SLIDE). The effectiveness of SLIDE is verified through extensive experiments on various benchmarks, and the performance can be even improved with fewer parameters of models in most cases. For example, in comparison with full fine-tuning GraphMAE on Amazon-Computers dataset, even randomly reducing 40\% of parameters, we can still achieve the improvement of 0. 24\% and 0. 27\% for Micro-F1 and Macro-F1 scores respectively.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Learning Invariant Representations of Graph Neural Networks via Cluster Generalization

Donglin Xia
Xiao Wang
Nian Liu
Chuan Shi

Graph neural networks (GNNs) have become increasingly popular in modeling graph-structured data due to their ability to learn node representations by aggregating local structure information. However, it is widely acknowledged that the test graph structure may differ from the training graph structure, resulting in a structure shift. In this paper, we experimentally find that the performance of GNNs drops significantly when the structure shift happens, suggesting that the learned models may be biased towards specific structure patterns. To address this challenge, we propose the Cluster Information Transfer (\textbf{CIT}) mechanism, which can learn invariant representations for GNNs, thereby improving their generalization ability to various and unknown test graphs with structure shift. The CIT mechanism achieves this by combining different cluster information with the nodes while preserving their cluster-independent information. By generating nodes across different clusters, the mechanism significantly enhances the diversity of the nodes and helps GNNs learn the invariant representations. We provide a theoretical analysis of the CIT mechanism, showing that the impact of changing clusters during structure shift can be mitigated after transfer. Additionally, the proposed mechanism is a plug-in that can be easily used to improve existing GNNs. We comprehensively evaluate our proposed method on three typical structure shift scenarios, demonstrating its effectiveness in enhancing GNNs' performance.

PDF Details

NeurIPS Conference 2023 Conference Paper

Provable Training for Graph Contrastive Learning

Yue Yu
Xiao Wang
Mengmei Zhang
Nian Liu
Chuan Shi

Graph Contrastive Learning (GCL) has emerged as a popular training approach for learning node embeddings from augmented graphs without labels. Despite the key principle that maximizing the similarity between positive node pairs while minimizing it between negative node pairs is well established, some fundamental problems are still unclear. Considering the complex graph structure, are some nodes consistently well-trained and following this principle even with different graph augmentations? Or are there some nodes more likely to be untrained across graph augmentations and violate the principle? How to distinguish these nodes and further guide the training of GCL? To answer these questions, we first present experimental evidence showing that the training of GCL is indeed imbalanced across all nodes. To address this problem, we propose the metric "node compactness", which is the lower bound of how a node follows the GCL principle related to the range of augmentations. We further derive the form of node compactness theoretically through bound propagation, which can be integrated into binary cross-entropy as a regularization. To this end, we propose the PrOvable Training (POT) for GCL, which regularizes the training of GCL to encode node embeddings that follows the GCL principle better. Through extensive experiments on various benchmarks, POT consistently improves the existing GCL approaches, serving as a friendly plugin.

PDF Details

NeurIPS Conference 2022 Conference Paper

Intermediate Prototype Mining Transformer for Few-Shot Semantic Segmentation

Yuanwei Liu
Nian Liu
Xiwen Yao
Junwei Han

Few-shot semantic segmentation aims to segment the target objects in query under the condition of a few annotated support images. Most previous works strive to mine more effective category information from the support to match with the corresponding objects in query. However, they all ignored the category information gap between query and support images. If the objects in them show large intra-class diversity, forcibly migrating the category information from the support to the query is ineffective. To solve this problem, we are the first to introduce an intermediate prototype for mining both deterministic category information from the support and adaptive category knowledge from the query. Specifically, we design an Intermediate Prototype Mining Transformer (IPMT) to learn the prototype in an iterative way. In each IPMT layer, we propagate the object information in both support and query features to the prototype and then use it to activate the query feature map. By conducting this process iteratively, both the intermediate prototype and the query feature can be progressively improved. At last, the final query feature is used to yield precise segmentation prediction. Extensive experiments on both PASCAL-5i and COCO-20i datasets clearly verify the effectiveness of our IPMT and show that it outperforms previous state-of-the-art methods by a large margin. Code is available at https: //github. com/LIUYUANWEI98/IPMT

PDF Details

NeurIPS Conference 2022 Conference Paper

Revisiting Graph Contrastive Learning from the Perspective of Graph Spectrum

Nian Liu
Xiao Wang
Deyu Bo
Chuan Shi
Jian Pei

Graph Contrastive Learning (GCL), learning the node representations by augmenting graphs, has attracted considerable attentions. Despite the proliferation of various graph augmentation strategies, there are still some fundamental questions unclear: what information is essentially learned by GCL? Are there some general augmentation rules behind different augmentations? If so, what are they and what insights can they bring? In this paper, we answer these questions by establishing the connection between GCL and graph spectrum. By an experimental investigation in spectral domain, we firstly find the General grAph augMEntation (GAME) rule for GCL, i. e. , the difference of the high-frequency parts between two augmented graphs should be larger than that of low-frequency parts. This rule reveals the fundamental principle to revisit the current graph augmentations and design new effective graph augmentations. Then we theoretically prove that GCL is able to learn the invariance information by contrastive invariance theorem, together with our GAME rule, for the first time, we uncover that the learned representations by GCL essentially encode the low-frequency information, which explains why GCL works. Guided by this rule, we propose a spectral graph contrastive learning module (SpCo), which is a general and GCL-friendly plug-in. We combine it with different existing GCL models, and extensive experiments well demonstrate that it can further improve the performances of a wide variety of different GCL methods.

PDF Details

IJCAI Conference 2021 Conference Paper

Context-aware Cross-level Fusion Network for Camouflaged Object Detection

Yujia Sun
Geng Chen
Tao Zhou
Yi Zhang
Nian Liu

Camouflaged object detection (COD) is a challenging task due to the low boundary contrast between the object and its surroundings. In addition, the appearance of camouflaged objects varies significantly, e. g. , object size and shape, aggravating the difficulties of accurate COD. In this paper, we propose a novel Context-aware Crosslevel Fusion Network (C2F-Net) to address the challenging COD task. Specifically, we propose an Attention-induced Cross-level Fusion Module (ACFM) to integrate the multi-level features with informative attention coefficients. The fused features are then fed to the proposed Dual-branch Global Context Module (DGCM), which yields multi-scale feature representations for exploiting rich global context information. In C2F-Net, the two modules are conducted on high-level features using a cascaded manner. Extensive experiments on three widely used benchmark datasets demonstrate that our C2F-Net is an effective COD model and outperforms state-of-the-art models remarkably. Our code is publicly available at: https: //github. com/thograce/C2FNet.

PDF Details DOI

AAAI Conference 2014 Conference Paper

Identifying Domain-Dependent Influential Microblog Users: A Post-Feature Based Approach

Nian Liu
Lin Li
Guandong Xu
Zhenglu Yang

Users of a social network like to follow the posts published by influential users. Such posts usually are delivered quickly and thus will produce a strong influence on public opinions. In this paper, we focus on the problem of identifying domaindependent influential users(or topic experts). Some of traditional approaches are based on the post contents of users users to identify influential users, which may be biased by spammers who try to make posts related to some topics through a simple copy and paste. Others make use of user authentication information given by a service platform or user self description (introduction or label) in finding influential users. However, what users have published is not necessarily related to what they have registed and described. In addition, if there is no comments from other users, its less objective to assess a users post quality. To improve effectiveness of recognizing influential users in a topic of microblogs, we propose a post-feature based approach which is supplementary to postcontent based approaches. Our experimental results show that the post-feature based approach produces relatively higher precision than that of the content based approach.

PDF Details