Author name cluster

Shan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers

2 author rows

AAAI Conference 2025 Conference Paper

Point Cloud Semantic Segmentation with Sparse and Inhomogeneous Annotations

Zhiyi Pan
Nan Zhang
Wei Gao
Shan Liu
Ge Li

Utilizing uniformly distributed sparse annotations, weakly supervised learning alleviates the heavy reliance on fine-grained annotations in point cloud semantic segmentation tasks. However, few works discuss the inhomogeneity of sparse annotations, albeit it is common in real-world scenarios. Therefore, this work introduces the probability density function into the gradient sampling approximation method to qualitatively analyze the impact of annotation sparsity and inhomogeneity under weakly supervised learning. Based on our analysis, we propose an Adaptive Annotation Distribution Network (AADNet) capable of robust learning on arbitrarily distributed sparse annotations. Specifically, we propose a label-aware point cloud downsampling strategy to increase the proportion of annotations involved in the training stage. Furthermore, we design the multiplicative dynamic entropy as the gradient calibration function to mitigate the gradient bias caused by non-uniformly distributed sparse annotations and explicitly reduce the epistemic uncertainty. Without any prior restrictions and additional information, our proposed method achieves comprehensive performance improvements at multiple label rates and different annotation distributions.

PDF Details DOI

EAAI Journal 2025 Journal Article

Real-time individual subway destination prediction: An AdaBoost graph neural network

Zhenhao Meng
Zhengli Wang
Xiang Liu
Shan Liu
Ya Zhang

Details DOI

EAAI Journal 2024 Journal Article

Core-attributes enhanced generative adversarial networks for robust image enhancement

Shan Liu
Guoqiang Xiao
Michael S. Lew
Xinbo Gao
Song Wu

Details DOI

NeurIPS Conference 2024 Conference Paper

Distribution Guidance Network for Weakly Supervised Point Cloud Semantic Segmentation

Zhiyi Pan
Wei Gao
Shan Liu
Ge Li

Despite alleviating the dependence on dense annotations inherent to fully supervised methods, weakly supervised point cloud semantic segmentation suffers from inadequate supervision signals. In response to this challenge, we introduce a novel perspective that imparts auxiliary constraints by regulating the feature space under weak supervision. Our initial investigation identifies which distributions accurately characterize the feature space, subsequently leveraging this priori to guide the alignment of the weakly supervised embeddings. Specifically, we analyze the superiority of the mixture of von Mises-Fisher distributions (moVMF) among several common distribution candidates. Accordingly, we develop a Distribution Guidance Network (DGNet), which comprises a weakly supervised learning branch and a distribution alignment branch. Leveraging reliable clustering initialization derived from the weakly supervised learning branch, the distribution alignment branch alternately updates the parameters of the moVMF and the network, ensuring alignment with the moVMF-defined latent space. Extensive experiments validate the rationality and effectiveness of our distribution choice and network design. Consequently, DGNet achieves state-of-the-art performance under multiple datasets and various weakly supervised settings.

PDF Details DOI

ECAI Conference 2024 Conference Paper

Improving Non-Autoregressive Sign Language Translation with Random Ordering Progressive Prediction Pretraining

Pei Yu
Changhao Lai
Cong Hu
Shan Liu
Liang Zhang
Biao Fu
Yidong Chen 0001

Recently, the Non-AutoRegressive (NAR) decoding mechanism, effectively reducing the inference latency of text generation, has been applied to Sign Language Translation (SLT). Typically, the current best NAR SLT model using a Curriculum-based Non-autoregressive Decoder (CND) outperforms AutoRegressive (AR) baselines in speed and performance. Although it has been proven that AutoRegressive Pre-trained Language Models (AR-PLMs) further boost the performance of AR SLT models, combining NAR Pretrained Language Models (NAR-PLMs) with NAR SLT model remains challenge due to (1) existing NAR-PLMs’ inability to model token dependencies between decoder layers, crucial for NAR SLT models using CND; (2) the modality gap between the decoder’s inputs of the NAR-PLMs and NAR SLT models. To address these, we propose a Random Ordering Progressive Prediction Pre-training task for NAR SLT models using CND, enabling the decoder to predict target sequences in diverse orderings and enhancing the modeling of target token dependencies between layers. Moreover, we propose a CTC-enhanced Soft Copy method to incorporate target-side information in the decoder’s inputs, alleviating the modality gap. Experimental results on PHOENIX-2014T and CSL-Daily demonstrate that our model consistently outperforms all strong baselines and achieves competitive performance with AR SLT models equipped with AR-PLMs.

Details

AAAI Conference 2024 Conference Paper

Layer-Wise Representation Fusion for Compositional Generalization

Yafang Zheng
Lei Lin
Shuangtao Li
Yuxuan Yuan
Zhaohong Lai
Shan Liu
Biao Fu
Yidong Chen

Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the ``shallow'' residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel Layer-wise Representation Fusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a fuse-attention module at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal. Codes are available at https://github.com/thinkaboutzero/LRF.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Less Is More: Label Recommendation for Weakly Supervised Point Cloud Semantic Segmentation

Zhiyi Pan
Nan Zhang
Wei Gao
Shan Liu
Ge Li

Weak supervision has proven to be an effective strategy for reducing the burden of annotating semantic segmentation tasks in 3D space. However, unconstrained or heuristic weakly supervised annotation forms may lead to suboptimal label efficiency. To address this issue, we propose a novel label recommendation framework for weakly supervised point cloud semantic segmentation. Distinct from pre-training and active learning, the label recommendation framework consists of three stages: inductive bias learning, recommendations for points to be labeled, and point cloud semantic segmentation learning. In practice, we first introduce the point cloud upsampling task to induct inductive bias from structural information. During the recommendation stage, we present a cross-scene clustering strategy to generate centers of clustering as recommended points. Then we introduce a recommended point positions attention module LabelAttention to model the long-range dependency under sparse annotations. Additionally, we employ position encoding to enhance the spatial awareness of semantic features. Throughout the framework, the useful information obtained from inductive bias learning is propagated to subsequent semantic segmentation networks in the form of label positions. Experimental results demonstrate that our framework outperforms weakly supervised point cloud semantic segmentation methods and other methods for labeling efficiency on S3DIS and ScanNetV2, even at an extremely low label rate.

PDF Details DOI

AAAI Conference 2022 Conference Paper

OctAttention: Octree-Based Large-Scale Contexts Model for Point Cloud Compression

Chunyang Fu
Ge Li
Rui Song
Wei Gao
Shan Liu

In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e. g. SemanticKITTI) and object point cloud dataset (e. g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The code is available at https: //github. com/zb12138/OctAttention.

PDF Details

AAAI Conference 2021 Conference Paper

SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains

Yuanqi Chen
Ge Li
Cece Jin
Shan Liu
Thomas Li

This paper observes that there is an issue of high frequencies missing in the discriminator of standard GAN, and we reveal it stems from downsampling layers employed in the network architecture. This issue makes the generator lack the incentive from the discriminator to learn high-frequency content of data, resulting in a significant spectrum discrepancy between generated images and real images. Since the Fourier transform is a bijective mapping, we argue that reducing this spectrum discrepancy would boost the performance of GANs. To this end, we introduce SSD-GAN, an enhancement of GANs to alleviate the spectral information loss in the discriminator. Specifically, we propose to embed a frequency-aware classifier into the discriminator to measure the realness of the input in both the spatial and spectral domains. With the enhanced discriminator, the generator of SSD-GAN is encouraged to learn high-frequency content of real data and generate exact details. The proposed method is general and can be easily integrated into most existing GANs framework without excessive cost. The effectiveness of SSD-GAN is validated on various network architectures, objective functions, and datasets. Code is available at https: //github. com/cyq373/SSD-GAN.

PDF Details

TIST Journal 2019 Journal Article

Exploiting the Value of the Center-dark Channel Prior for Salient Object Detection

Chunbiao Zhu
Wenhao Zhang
Thomas H. Li
Shan Liu
Ge Li

Saliency detection aims to detect the most attractive objects in images and is widely used as a foundation for various applications. In this article, we propose a novel salient object detection algorithm for RGB-D images using center-dark channel priors. First, we generate an initial saliency map based on a color saliency map and a depth saliency map of a given RGB-D image. Then, we generate a center-dark channel map based on center saliency and dark channel priors. Finally, we fuse the initial saliency map with the center dark channel map to generate the final saliency map. Extensive evaluations over four benchmark datasets demonstrate that our proposed method performs favorably against most of the state-of-the-art approaches. Besides, we further discuss the application of the proposed algorithm in small target detection and demonstrate the universal value of center-dark channel priors in the field of object detection.

Details DOI

AAAI Conference 2019 Conference Paper

Meta Learning for Image Captioning

Nannan Li
Zhenzhong Chen
Shan Liu

Reinforcement learning (RL) has shown its advantages in image captioning by optimizing the non-differentiable metric directly in the reward learning process. However, due to the reward hacking problem in RL, maximizing reward may not lead to better quality of the caption, especially from the aspects of propositional content and distinctiveness. In this work, we propose to use a new learning method, meta learning, to utilize supervision from the ground truth whilst optimizing the reward function in RL. To improve the propositional content and the distinctiveness of the generated captions, the proposed model provides the global optimal solution by taking different gradient steps towards the supervision task and the reinforcement task, simultaneously. Experimental results on MS COCO validate the effectiveness of our approach when compared with the state-of-the-art methods.

PDF Details

NeurIPS Conference 2019 Conference Paper

Multi-mapping Image-to-Image Translation via Learning Disentanglement

Xiaoming Yu
Yuanqi Chen
Shan Liu
Thomas Li
Ge Li

Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each other's problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.

PDF Details