Author name cluster

Chunyan Xu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

13 papers

2 author rows

IJCAI Conference 2025 Conference Paper

Dual-Perspective United Transformer for Object Segmentation in Optical Remote Sensing Images

Yanguang Sun
Jiexi Yan
Jianjun Qian
Chunyan Xu
Jian Yang
Lei Luo

Automatically segmenting objects from optical remote sensing images (ORSIs) is an important task. Most existing models are primarily based on either convolutional or Transformer features, each offering distinct advantages. Exploiting both advantages is valuable research, but it presents several challenges, including the heterogeneity between the two types of features, high complexity, and large parameters of the model. However, these issues are often overlooked in existing the ORSIs methods, causing sub-optimal segmentation. For that, we propose a novel Dual-Perspective United Transformer (DPU-Former) with a unique structure designed to simultaneously integrate long-range dependencies and spatial details. In particular, we design the global-local mixed attention, which captures diverse information through two perspectives and introduces a Fourier-space merging strategy to obviate deviations for efficient fusion. Furthermore, we present a gated linear feed-forward network to increase the expressive ability. Additionally, we construct a DPU-Former decoder to aggregate and strength features at different layers. Consequently, the DPU-Former model outperforms the state-of-the-art methods on multiple datasets. Code: https: //github. com/CSYSI/DPU-Former.

PDF Details DOI

NeurIPS Conference 2025 Conference Paper

Learn and Ensemble Bridge Adapters for Multi-domain Task Incremental Learning

Ziqi Gu
Chunyan Xu
Wenxuan Fang
Xin Liu
Yide Qiu
Zhen Cui

Multi-domain task incremental learning (MTIL) demands models to master domain-specific expertise while preserving generalization capabilities. Inspired by human lifelong learning, which relies on revisiting, aligning, and integrating past experiences, we propose a Learning and Ensembling Bridge Adapters (LEBA) framework. To facilitate cohesive knowledge transfer across domains, specifically, we propose a continuous-domain bridge adaptation module, leveraging the distribution transfer capabilities of Schrödinger bridge for stable progressive learning. To strengthen memory consolidation, we further propose a progressive knowledge ensemble strategy that revisits past task representations via a diffusion model and dynamically integrates historical adapters. For efficiency, LEBA maintains a compact adapter pool through similarity-based selection and employs learnable weights to align replayed samples with current task semantics. Together, these components effectively mitigate catastrophic forgetting and enhance generalization across tasks. Extensive experiments across multiple benchmarks validate the effectiveness and superiority of LEBA over state-of-the-art methods.

PDF Details

AAAI Conference 2025 Conference Paper

Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Chenxu Wang
Chunyan Xu
Xiang Li
Yuxuan Li
Xu Guo
Ziqi Gu
Zhen Cui

While existing semi-supervised object detection (SSOD) methods perform well in general scenes, they encounter challenges in handling oriented objects in aerial images. We experimentally find three gaps between general and oriented object detection in semi-supervised learning: 1) Sampling inconsistency: the common center sampling is not suitable for oriented objects with larger aspect ratios when selecting positive labels from labeled data. 2) Assignment inconsistency: balancing the precision and localization quality of oriented pseudo-boxes poses greater challenges which introduces more noise when selecting positive labels from unlabeled data. 3) Confidence inconsistency: there exists more mismatch between the predicted classification and localization qualities when considering oriented objects, affecting the selection of pseudo-labels. Therefore, we propose a Multi-clue Consistency Learning (MCL) framework to bridge gaps between general and oriented objects in semi-supervised detection. Specifically, considering various shapes of rotated objects, the Gaussian Center Assignment is specially designed to select the pixel-level positive labels from labeled data. We then introduce the Scale-aware Label Assignment to select pixel-level pseudo-labels instead of unreliable pseudo-boxes, which is a divide-and-rule strategy suited for objects with various scales. The Consistent Confidence Soft Label is adopted to further boost the detector by maintaining the alignment of the predicted results. Comprehensive experiments on DOTA-v1.5 and DOTA-v1.0 benchmarks demonstrate that our proposed MCL can achieve state-of-the-art performance in the semi-supervised oriented object detection task.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

MMM-RS: A Multi-modal, Multi-GSD, Multi-scene Remote Sensing Dataset and Benchmark for Text-to-Image Generation

Jialin Luo
Yuanzhi Wang
Ziqi Gu
Yide Qiu
Shuaizhen Yao
Fuyun Wang
Chunyan Xu
Wenhua Zhang

Recently, the diffusion-based generative paradigm has achieved impressive general image generation capabilities with text prompts due to its accurate distribution modeling and stable training process. However, generating diverse remote sensing (RS) images that are tremendously different from general images in terms of scale and perspective remains a formidable challenge due to the lack of a comprehensive remote sensing image generation dataset with various modalities, ground sample distances (GSD), and scenes. In this paper, we propose a Multi-modal, Multi-GSD, Multi-scene Remote Sensing (MMM-RS) dataset and benchmark for text-to-image generation in diverse remote sensing scenarios. Specifically, we first collect nine publicly available RS datasets and conduct standardization for all samples. To bridge RS images to textual semantic information, we utilize a large-scale pretrained vision-language model to automatically output text prompts and perform hand-crafted rectification, resulting in information-rich text-image pairs (including multi-modal images). In particular, we design some methods to obtain the images with different GSD and various environments (e. g. , low-light, foggy) in a single sample. With extensive manual screening and refining annotations, we ultimately obtain a MMM-RS dataset that comprises approximately 2. 1 million text-image pairs. Extensive experimental results verify that our proposed MMM-RS dataset allows off-the-shelf diffusion models to generate diverse RS images across various modalities, scenes, weather conditions, and GSD. The dataset is available at https: //github. com/ljl5261/MMM-RS.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Progressive Exploration-Conformal Learning for Sparsely Annotated Object Detection in Aerial Images

Zihan Lu
Chenxu Wang
Chunyan Xu
Xiangwei Zheng
Zhen Cui

The ability to detect aerial objects with limited annotation is pivotal to the development of real-world aerial intelligence systems. In this work, we focus on a demanding but practical sparsely annotated object detection (SAOD) in aerial images, which encompasses a wider variety of aerial scenes with the same number of annotated objects. Although most existing SAOD methods rely on fixed thresholding to filter pseudo-labels for enhancing detector performance, adapting to aerial objects proves challenging due to the imbalanced probabilities/confidences associated with predicted aerial objects. To address this problem, we propose a novel Progressive Exploration-Conformal Learning (PECL) framework to address the SAOD task, which can adaptively perform the selection of high-quality pseudo-labels in aerial images. Specifically, the pseudo-label exploration can be formulated as a decision-making paradigm by adopting a conformal pseudo-label explorer and a multi-clue selection evaluator. The conformal pseudo-label explorer learns an adaptive policy by maximizing the cumulative reward, which can decide how to select these high-quality candidates by leveraging their essential characteristics and inter-instance contextual information. The multi-clue selection evaluator is designed to evaluate the explorer-guided pseudo-label selections by providing an instructive feedback for policy optimization. Finally, the explored pseudo-labels can be adopted to guide the optimization of aerial object detector in a closed-looping progressive fashion. Comprehensive evaluations on two public datasets demonstrate the superiority of our PECL when compared with other state-of-the-art methods in the sparsely annotated aerial object detection task.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Exploratory Inference Learning for Scribble Supervised Semantic Segmentation

Chuanwei Zhou
Zhen Cui
Chunyan Xu
Cao Han
Jian Yang

Scribble supervised semantic segmentation has achieved great advances in pseudo label exploitation, yet suffers insufficient label exploration for the mass of unannotated regions. In this work, we propose a novel exploratory inference learning (EIL) framework, which facilitates efficient probing on unlabeled pixels and promotes selecting confident candidates for boosting the evolved segmentation. The exploration of unannotated regions is formulated as an iterative decision-making process, where a policy searcher learns to infer in the unknown space and the reward to the exploratory policy is based on a contrastive measurement of candidates. In particular, we devise the contrastive reward with the intra-class attraction and the inter-class repulsion in the feature space w.r.t the pseudo labels. The unlabeled exploration and the labeled exploitation are jointly balanced to improve the segmentation, and framed in a close-looping end-to-end network. Comprehensive evaluations on the benchmark datasets (PASCAL VOC 2012 and PASCAL Context) demonstrate the superiority of our proposed EIL when compared with other state-of-the-art methods for the scribble-supervised semantic segmentation problem.

PDF Details DOI

AAAI Conference 2023 Conference Paper

Progressive Bayesian Inference for Scribble-Supervised Semantic Segmentation

Chuanwei Zhou
Chunyan Xu
Zhen Cui

The scribble-supervised semantic segmentation is an important yet challenging task in the field of computer vision. To deal with the pixel-wise sparse annotation problem, we propose a Progressive Bayesian Inference (PBI) framework to boost the performance of the scribble-supervised semantic segmentation, which can effectively infer the semantic distribution of these unlabeled pixels to guide the optimization of the segmentation network. The PBI dynamically improves the model learning from two aspects: the Bayesian inference module (i.e., semantic distribution learning) and the pixel-wise segmenter (i.e., model updating). Specifically, we effectively infer the semantic probability distribution of these unlabeled pixels with our designed Bayesian inference module, where its guidance is estimated through the Bayesian expectation maximization under the situation of partially observed data. The segmenter can be progressively improved under the joint guidance of the original scribble information and the learned semantic distribution. The segmenter optimization and semantic distribution promotion are encapsulated into a unified architecture where they could improve each other with mutual evolution in a progressive fashion. Comprehensive evaluations of several benchmark datasets demonstrate the effectiveness and superiority of our proposed PBI when compared with other state-of-the-art methods applied to the scribble-supervised semantic segmentation task.

PDF Details DOI

ICLR Conference 2020 Conference Paper

Graph inference learning for semi-supervised classification

Chunyan Xu
Zhen Cui 0001
Xiaobin Hong 0002
Tong Zhang 0021
Jian Yang 0003
Wei Liu 0005

In this work, we address the semi-supervised classification of graph data, where the categories of those unlabeled nodes are inferred from labeled nodes as well as graph structures. Recent works often solve this problem with the advanced graph convolution in a conventional supervised manner, but the performance could be heavily affected when labeled data is scarce. Here we propose a Graph Inference Learning (GIL) framework to boost the performance of node classification by learning the inference of node labels on graph topology. To bridge the connection of two nodes, we formally define a structure relation by encapsulating node attributes, between-node paths and local topological structures together, which can make inference conveniently deduced from one node to another node. For learning the inference process, we further introduce meta-optimization on structure relations from training nodes to validation nodes, such that the learnt graph inference capability can be better self-adapted into test nodes. Comprehensive evaluations on four benchmark datasets (including Cora, Citeseer, Pubmed and NELL) demonstrate the superiority of our GIL when compared with other state-of-the-art methods in the semi-supervised node classification task.

Details

AAAI Conference 2020 Conference Paper

Variational Pathway Reasoning for EEG Emotion Recognition

Tong Zhang
Zhen Cui
Chunyan Xu
Wenming Zheng
Jian Yang

Research on human emotion cognition revealed that connections and pathways exist between spatially-adjacent and functional-related areas during emotion expression (Adolphs 2002a; Bullmore and Sporns 2009). Deeply inspired by this mechanism, we propose a heuristic Variational Pathway Reasoning (VPR) method to deal with EEG-based emotion recognition. We introduce random walk to generate a large number of candidate pathways along electrodes. To encode each pathway, the dynamic sequence model is further used to learn between-electrode dependencies. The encoded pathways around each electrode are aggregated to produce a pseudo maximum-energy pathway, which consists of the most important pair-wise connections. To ﬁnd those most salient connections, we propose a sparse variational scaling (SVS) module to learn scaling factors of pseudo pathways by using the Bayesian probabilistic process and sparsity constraint, where the former endows good generalization ability while the latter favors adaptive pathway selection. Finally, the salient pathways from those candidates are jointly decided by the pseudo pathways and scaling factors. Extensive experiments on EEG emotion recognition demonstrate that the proposed VPR is superior to those state-of-the-art methods, and could ﬁnd some interesting pathways w. r. t. different emotions.

PDF Details

AAAI Conference 2019 Conference Paper

Gaussian-Induced Convolution for Graphs

Jiatao Jiang
Zhen Cui
Chunyan Xu
Jian Yang

Learning representation on graph plays a crucial role in numerous tasks of pattern recognition. Different from gridshaped images/videos, on which local convolution kernels can be lattices, however, graphs are fully coordinate-free on vertices and edges. In this work, we propose a Gaussianinduced convolution (GIC) framework to conduct local convolution filtering on irregular graphs. Specifically, an edgeinduced Gaussian mixture model is designed to encode variations of subgraph region by integrating edge information into weighted Gaussian models, each of which implicitly characterizes one component of subgraph variations. In order to coarsen a graph, we derive a vertex-induced Gaussian mixture model to cluster vertices dynamically according to the connection of edges, which is approximately equivalent to the weighted graph cut. We conduct our multi-layer graph convolution network on several public datasets of graph classification. The extensive experiments demonstrate that our GIC is effective and can achieve the state-of-the-art results.

PDF Details

AAAI Conference 2018 Conference Paper

Spatio-Temporal Graph Convolution for Skeleton Based Action Recognition

Chaolong Li
Zhen Cui
Wenming Zheng
Chunyan Xu
Jian Yang

Variations of human body skeletons may be considered as dynamic graphs, which are generic data representation for numerous real-world applications. In this paper, we propose a spatio-temporal graph convolution (STGC) approach for assembling the successes of local convolutional ﬁltering and sequence learning ability of autoregressive moving average. To encode dynamic graphs, the constructed multi-scale local graph convolution ﬁlters, consisting of matrices of local receptive ﬁelds and signal mappings, are recursively performed on structured graph data of temporal and spatial domain. The proposed model is generic and principled as it can be generalized into other dynamic models. We theoretically prove the stability of STGC and provide an upper-bound of the signal transformation to be learnt. Further, the proposed recursive model can be stacked into a multi-layer architecture. To evaluate our model, we conduct extensive experiments on four benchmark skeleton-based action datasets, including the large-scale challenging NTU RGB+D. The experimental results demonstrate the effectiveness of our proposed model and the improvement over the state-of-the-art.

PDF Details

AAAI Conference 2016 Conference Paper

Deep Learning with S-Shaped Rectified Linear Activation Units

Xiaojie Jin
Chunyan Xu
Jiashi Feng
Yunchao Wei
Junjun Xiong
Shuicheng Yan

Rectiﬁed linear activation units are important components for state-of-the-art deep convolutional networks. In this paper, we propose a novel S-shaped rectiﬁed linear activation unit (SReLU) to learn both convex and non-convex functions, imitating the multiple function forms given by the two fundamental laws, namely the Webner-Fechner law and the Stevens law, in psychophysics and neural sciences. Speciﬁcally, SReLU consists of three piecewise linear functions, which are formulated by four learnable parameters. The SReLU is learned jointly with the training of the whole deep network through back propagation. During the training phase, to initialize SReLU in different layers, we propose a “freezing” method to degenerate SReLU into a predeﬁned leaky rectiﬁed linear unit in the initial several training epochs and then adaptively learn the good initial values. SReLU can be universally used in the existing deep networks with negligible additional parameters and computation cost. Experiments with two popular CNN architectures, Network in Network and GoogLeNet on scale-various benchmarks including CI- FAR10, CIFAR100, MNIST and ImageNet demonstrate that SReLU achieves remarkable improvement compared to other activation functions.

PDF Details

AAAI Conference 2015 Conference Paper

Generalized Singular Value Thresholding

Canyi Lu
Changbo Zhu
Chunyan Xu
Shuicheng Yan
Zhouchen Lin

This work studies the Generalized Singular Value Thresholding (GSVT) operator Proxσ g (·), Proxσ g (B) = arg min X m X i=1 g(σi(X)) + 1 2 ||X − B||2 F, associated with a nonconvex function g defined on the singular values of X. We prove that GSVT can be obtained by performing the proximal operator of g (denoted as Proxg(·)) on the singular values since Proxg(·) is monotone when g is lower bounded. If the nonconvex g satisfies some conditions (many popular nonconvex surrogate functions, e. g. , `p-norm, 0 < p < 1, of `0-norm are special cases), a general solver to find Proxg(b) is proposed for any b ≥ 0. GSVT greatly generalizes the known Singular Value Thresholding (SVT) which is a basic subroutine in many convex low rank minimization methods. We are able to solve the nonconvex low rank minimization problem by using GSVT in place of SVT.

PDF Details