Author name cluster

Congyan Lang

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

NeurIPS Conference 2024 Conference Paper

DFA-GNN: Forward Learning of Graph Neural Networks by Direct Feedback Alignment

Gongpei Zhao
Tao Wang
Congyan Lang
Yi Jin
Yidong Li
Haibin Ling

Graph neural networks (GNNs) are recognized for their strong performance across various applications, with the backpropagation (BP) algorithm playing a central role in the development of most GNN models. However, despite its effectiveness, BP has limitations that challenge its biological plausibility and affect the efficiency, scalability and parallelism of training neural networks for graph-based tasks. While several non-backpropagation (non-BP) training algorithms, such as the direct feedback alignment (DFA), have been successfully applied to fully-connected and convolutional network components for handling Euclidean data, directly adapting these non-BP frameworks to manage non-Euclidean graph data in GNN models presents significant challenges. These challenges primarily arise from the violation of the independent and identically distributed (i. i. d. ) assumption in graph data and the difficulty in accessing prediction errors for all samples (nodes) within the graph. To overcome these obstacles, in this paper we propose DFA-GNN, a novel forward learning framework tailored for GNNs with a case study of semi-supervised learning. The proposed method breaks the limitations of BP by using a dedicated forward training mechanism. Specifically, DFA-GNN extends the principles of DFA to adapt to graph data and unique architecture of GNNs, which incorporates the information of graph topology into the feedback links to accommodate the non-Euclidean characteristics of graph data. Additionally, for semi-supervised graph learning tasks, we developed a pseudo error generator that spreads residual errors from training data to create a pseudo error for each unlabeled node. These pseudo errors are then utilized to train GNNs using DFA. Extensive experiments on 10 public benchmarks reveal that our learning framework outperforms not only previous non-BP methods but also the standard BP methods, and it exhibits excellent robustness against various types of noise and attacks.

PDF Details DOI

NeurIPS Conference 2024 Conference Paper

Generated and Pseudo Content guided Prototype Refinement for Few-shot Point Cloud Segmentation

Lili Wei
Congyan Lang
Ziyi Chen
Tao Wang
Yidong Li
Jun Liu

Few-shot 3D point cloud semantic segmentation aims to segment query point clouds with only a few annotated support point clouds. Existing prototype-based methods learn prototypes from the 3D support set to guide the segmentation of query point clouds. However, they encounter the challenge of low prototype quality due to constrained semantic information in the 3D support set and class information bias between support and query sets. To address these issues, in this paper, we propose a novel framework called Generated and Pseudo Content guided Prototype Refinement (GPCPR), which explicitly leverages LLM-generated content and reliable query context to enhance prototype quality. GPCPR achieves prototype refinement through two core components: LLM-driven Generated Content-guided Prototype Refinement (GCPR) and Pseudo Query Context-guided Prototype Refinement (PCPR). Specifically, GCPR integrates diverse and differentiated class descriptions generated by large language models to enrich prototypes with comprehensive semantic knowledge. PCPR further aggregates reliable class-specific pseudo-query context to mitigate class information bias and generate more suitable query-specific prototypes. Furthermore, we introduce a dual-distillation regularization term, enabling knowledge transfer between early-stage entities (prototypes or pseudo predictions) and their deeper counterparts to enhance refinement. Extensive experiments demonstrate the superiority of our method, surpassing the state-of-the-art methods by up to 12. 10% and 13. 75% mIoU on S3DIS and ScanNet, respectively.

PDF Details DOI

AAAI Conference 2024 Conference Paper

RL-SeqISP: Reinforcement Learning-Based Sequential Optimization for Image Signal Processing

Xinyu Sun
Zhikun Zhao
Lili Wei
Congyan Lang
Mingxuan Cai
Longfei Han
Juan Wang
Bing Li

Hardware image signal processing (ISP), aiming at converting RAW inputs to RGB images, consists of a series of processing blocks, each with multiple parameters. Traditionally, ISP parameters are manually tuned in isolation by imaging experts according to application-specific quality and performance metrics, which is time-consuming and biased towards human perception due to complex interaction with the output image. Since the relationship between any single parameter’s variation and the output performance metric is a complex, non-linear function, optimizing such a large number of ISP parameters is challenging. To address this challenge, we propose a novel Sequential ISP parameter optimization model, called the RL-SeqISP model, which utilizes deep reinforcement learning to jointly optimize all ISP parameters for a variety of imaging applications. Concretely, inspired by the sequential tuning process of human experts, the proposed model can progressively enhance image quality by seamlessly integrating information from both the image feature space and the parameter space. Furthermore, a dynamic parameter optimization module is introduced to avoid ISP parameters getting stuck into local optima, which is able to more effectively guarantee the optimal parameters resulting from the sequential learning strategy. These merits of the RL-SeqISP model as well as its high efficiency are substantiated by comprehensive experiments on a wide range of downstream tasks, including two visual analysis tasks (instance segmentation and object detection), and image quality assessment (IQA), as compared with representative methods both quantitatively and qualitatively. In particular, even using only 10% of the training data, our model outperforms other SOTA methods by an average of 7% mAP on two visual analysis tasks.

PDF Details DOI

TIST Journal 2022 Journal Article

Redundant Label Learning via Subspace Representation and Global Disambiguation

Gengyu Lyu
Songhe Feng
Wei Liu
Shuoyan Liu
Congyan Lang

Redundant Label Learning (RLL) aims at inducing a robust model from training data, where each example is associated with a set of candidate labels, among which some of them are incorrect. Most existing approaches deal with such problem by disambiguating the candidate labels first and then inducing the predictive model from the disambiguated data. However, these approaches only focus on disambiguation for each instance’ candidate label set, while the global label context tends to be ignored. Meanwhile, these approaches usually induce the objective model by directly utilizing the original feature information, which may lead to the model overfitting due to high-dimensional redundant features. To tackle the above issues, we propose a novel feature S ubspac E R epresentation and label G lobal Disambiguat IO n ( SERGIO ) approach, which improves the generalization ability of the learning system from the perspective of both feature space and label space. Specifically, we project the original high-dimensional feature space into a low-dimensional subspace, where the projection matrix is regularized with an orthogonality constraint to make the subspace more compact. Meanwhile, we introduce a label confidence matrix and constrain it with ℓ 1 -norm and trace-norm regularization simultaneously, which are utilized to explore global label correlations and further well in accordance with the nature of single-label classification and multi-label classification problem, respectively. Extensive experiments on both single-label and multi-label RLL datasets demonstrate that our proposed method achieves competitive performance against state-of-the-art approaches.

Details DOI

TIST Journal 2022 Journal Article

Weakly Supervised Video Object Segmentation via Dual-attention Cross-branch Fusion

Lili Wei
Congyan Lang
Liqian Liang
Songhe Feng
Tao Wang
Shidi Chen

Recently, concerning the challenge of collecting large-scale explicitly annotated videos, weakly supervised video object segmentation (WSVOS) using video tags has attracted much attention. Existing WSVOS approaches follow a general pipeline including two phases, i.e., a pseudo masks generation phase and a refinement phase. To explore the intrinsic property and correlation buried in the video frames, most of them focus on the later phase by introducing optical flow as temporal information to provide more supervision. However, these optical flow-based studies are greatly affected by illumination and distortion and lack consideration of the discriminative capacity of multi-level deep features. In this article, with the goal of capturing more effective temporal information and investigating a temporal information fusion strategy accordingly, we propose a unified WSVOS model by adopting a two-branch architecture with a multi-level cross-branch fusion strategy, named as dual-attention cross-branch fusion network (DACF-Net). Concretely, the two branches of DACF-Net, i.e., a temporal prediction subnetwork (TPN) and a spatial segmentation subnetwork (SSN), are used for extracting temporal information and generating predicted segmentation masks, respectively. To perform the cross-branch fusion between TPN and SSN, we propose a dual-attention fusion module that can be plugged into the SSN flexibly. We also pose a cross-frame coherence loss (CFCL) to achieve smooth segmentation results by exploiting the coherence of masks produced by TPN and SSN. Extensive experiments demonstrate the effectiveness of proposed approach compared with the state-of-the-arts on two challenging datasets, i.e., Davis-2016 and YouTube-Objects.

Details DOI

TIST Journal 2021 Journal Article

Fine-Grained Semantic Image Synthesis with Object-Attention Generative Adversarial Network

Min Wang
Congyan Lang
Liqian Liang
Songhe Feng
Tao Wang
Yutong Gao

Semantic image synthesis is a new rising and challenging vision problem accompanied by the recent promising advances in generative adversarial networks. The existing semantic image synthesis methods only consider the global information provided by the semantic segmentation mask, such as class label, global layout, and location, so the generative models cannot capture the rich local fine-grained information of the images (e.g., object structure, contour, and texture). To address this issue, we adopt a multi-scale feature fusion algorithm to refine the generated images by learning the fine-grained information of the local objects. We propose OA-GAN, a novel object-attention generative adversarial network that allows attention-driven, multi-fusion refinement for fine-grained semantic image synthesis. Specifically, the proposed model first generates multi-scale global image features and local object features, respectively, then the local object features are fused into the global image features to improve the correlation between the local and the global. In the process of feature fusion, the global image features and the local object features are fused through the channel-spatial-wise fusion block to learn ‘what’ and ‘where’ to attend in the channel and spatial axes, respectively. The fused features are used to construct correlation filters to obtain feature response maps to determine the locations, contours, and textures of the objects. Extensive quantitative and qualitative experiments on COCO-Stuff, ADE20K and Cityscapes datasets demonstrate that our OA-GAN significantly outperforms the state-of-the-art methods.

Details DOI

TIST Journal 2020 Journal Article

End-to-End Text-to-Image Synthesis with Spatial Constrains

Min Wang
Congyan Lang
Liqian Liang
Songhe Feng
Tao Wang
Yutong Gao

Although the performance of automatically generating high-resolution realistic images from text descriptions has been significantly boosted, many challenging issues in image synthesis have not been fully investigated, due to shapes variations, viewpoint changes, pose changes, and the relations of multiple objects. In this article, we propose a novel end-to-end approach for text-to-image synthesis with spatial constraints by mining object spatial location and shape information. Instead of learning a hierarchical mapping from text to image, our algorithm directly generates multi-object fine-grained images through the guidance of the generated semantic layouts. By fusing text semantic and spatial information into a synthesis module and jointly fine-tuning them with multi-scale semantic layouts generated, the proposed networks show impressive performance in text-to-image synthesis for complex scenes. We evaluate our method both on single-object CUB dataset and multi-object MS-COCO dataset. Comprehensive experimental results demonstrate that our method significantly outperforms the state-of-the-art approaches consistently across different evaluation metrics.

Details DOI

TIST Journal 2020 Journal Article

HERA

Gengyu Lyu
Songhe Feng
Yidong Li
Yi Jin
Guojun Dai
Congyan Lang

Partial label learning (PLL) aims to learn from the data where each training instance is associated with a set of candidate labels, among which only one is correct. Most existing methods deal with this type of problem by either treating each candidate label equally or identifying the ground-truth label iteratively. In this article, we propose a novel PLL approach named HERA, which simultaneously incorporates the HeterogEneous Loss and the SpaRse and Low-rAnk procedure to estimate the labeling confidence for each instance while training the desired model. Specifically, the heterogeneous loss integrates the strengths of both the pairwise ranking loss and the pointwise reconstruction loss to provide informative label ranking and reconstruction information for label identification, whereas the embedded sparse and low-rank scheme constrains the sparsity of ground-truth label matrix and the low rank of noise label matrix to explore the global label relevance among the whole training data, for improving the learning model. Comprehensive ablation study demonstrates the effectiveness of our employed heterogeneous loss, and extensive experiments on both artificial and real-world datasets demonstrate that our method achieves superior or comparable performance against state-of-the-art methods.

Details DOI

TIST Journal 2019 Journal Article

Co-saliency Detection with Graph Matching

Zun Li
Congyan Lang
Jiashi Feng
Yidong Li
Tao Wang
Songhe Feng

Recently, co-saliency detection, which aims to automatically discover common and salient objects appeared in several relevant images, has attracted increased interest in the computer vision community. In this article, we present a novel graph-matching based model for co-saliency detection in image pairs. A solution of graph matching is proposed to integrate the visual appearance, saliency coherence, and spatial structural continuity for detecting co-saliency collaboratively. Since the saliency and the visual similarity have been seamlessly integrated, such a joint inference schema is able to produce more accurate and reliable results. More concretely, the proposed model first computes the intra-saliency for each image by aggregating multiple saliency cues. The common and salient regions across multiple images are thus discovered via a graph matching procedure. Then, a graph reconstruction scheme is proposed to refine the intra-saliency iteratively. Compared to existing co-saliency detection methods that only utilize visual appearance cues, our proposed model can effectively exploit both visual appearance and structure information to better guide co-saliency detection. Extensive experiments on several challenging image pair databases demonstrate that our model outperforms state-of-the-art baselines significantly.

Details DOI

AAAI Conference 2019 Conference Paper

Partial Multi-Label Learning by Low-Rank and Sparse Decomposition

Lijuan Sun
Songhe Feng
Tao Wang
Congyan Lang
Yi Jin

Multi-Label Learning (MLL) aims to learn from the training data where each example is represented by a single instance while associated with a set of candidate labels. Most existing MLL methods are typically designed to handle the problem of missing labels. However, in many real-world scenarios, the labeling information for multi-label data is always redundant, which can not be solved by classical MLL methods, thus a novel Partial Multi-label Learning (PML) framework is proposed to cope with such problem, i. e. removing the the noisy labels from the multi-label sets. In this paper, in order to further improve the denoising capability of PML framework, we utilize the low-rank and sparse decomposition scheme and propose a novel Partial Multi-label Learning by Low-Rank and Sparse decomposition (PML-LRS) approach. Specifically, we first reformulate the observed label set into a label matrix, and then decompose it into a groundtruth label matrix and an irrelevant label matrix, where the former is constrained to be low rank and the latter is assumed to be sparse. Next, we utilize the feature mapping matrix to explore the label correlations and meanwhile constrain the feature mapping matrix to be low rank to prevent the proposed method from being overfitting. Finally, we obtain the ground-truth labels via minimizing the label loss, where the Augmented Lagrange Multiplier (ALM) algorithm is incorporated to solve the optimization problem. Enormous experimental results demonstrate that PML-LRS can achieve superior or competitive performance against other state-of-the-art methods.

PDF Details

ICRA Conference 2018 Conference Paper

Constrained Confidence Matching for Planar Object Tracking

Tao Wang 0011
Haibin Ling
Congyan Lang
Songhe Feng
Yi Jin 0001
Yidong Li

Tracking planar objects has a wide range of applications in robotics. Conventional template tracking algorithms, however, often fail to observe fast object motion or drift significantly after a period of time, due to drastic object appearance change. To address such challenges, we propose a novel constrained confidence matching algorithm for motion estimation and a robust Kalman filter for template updating. Integrated with an accurate occlusion detector, our approach achieves accurate motion estimation in presence of partial occlusion, by excluding occluded pixels from computation of motion parameters. Furthermore, the proposed Kalman filter employs a novel control-input model to handle the object appearance change, which brings our tracker high robustness against sudden illumination change and heavy motion blur. For evaluation, we compare the proposed tracker with several state-of-the-art planar object trackers on two public benchmark datasets. Experimental results show that our algorithm achieves robust tracking results against various environmental variations, and outperforms baseline algorithms remarkably on both datasets.

Details

AAAI Conference 2013 Conference Paper

Salient Object Detection via Low-Rank and Structured Sparse Matrix Decomposition

Houwen Peng
Bing Li
Rongrong Ji
Weiming Hu
Weihua Xiong
Congyan Lang

Salient object detection provides an alternative solution to various image semantic understanding tasks such as object recognition, adaptive compression and image retrieval. Recently, low-rank matrix recovery (LR) theory has been introduced into saliency detection, and achieves impressed results. However, the existing LR-based models neglect the underlying structure of images, and inevitably degrade the associated performance. In this paper, we propose a Low-rank and Structured sparse Matrix Decomposition (LSMD) model for salient object detection. In the model, a tree-structured sparsity-inducing norm regularization is firstly introduced to provide a hierarchical description of the image structure to ensure the completeness of the extracted salient object. The similarity of saliency values within the salient object is then guaranteed by the `∞-norm. Finally, high-level priors are integrated to guide the matrix decomposition and enhance the saliency detection. Experimental results on the largest public benchmark database show that our model outperforms existing LRbased approaches and other state-of-the-art methods, which verifies the effectiveness and robustness of the structure cues in our model.

PDF Details