Arrow Research search

Author name cluster

Shan Liu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

12 papers
2 author rows

Possible papers

12

AAAI Conference 2025 Conference Paper

Point Cloud Semantic Segmentation with Sparse and Inhomogeneous Annotations

  • Zhiyi Pan
  • Nan Zhang
  • Wei Gao
  • Shan Liu
  • Ge Li

Utilizing uniformly distributed sparse annotations, weakly supervised learning alleviates the heavy reliance on fine-grained annotations in point cloud semantic segmentation tasks. However, few works discuss the inhomogeneity of sparse annotations, albeit it is common in real-world scenarios. Therefore, this work introduces the probability density function into the gradient sampling approximation method to qualitatively analyze the impact of annotation sparsity and inhomogeneity under weakly supervised learning. Based on our analysis, we propose an Adaptive Annotation Distribution Network (AADNet) capable of robust learning on arbitrarily distributed sparse annotations. Specifically, we propose a label-aware point cloud downsampling strategy to increase the proportion of annotations involved in the training stage. Furthermore, we design the multiplicative dynamic entropy as the gradient calibration function to mitigate the gradient bias caused by non-uniformly distributed sparse annotations and explicitly reduce the epistemic uncertainty. Without any prior restrictions and additional information, our proposed method achieves comprehensive performance improvements at multiple label rates and different annotation distributions.

NeurIPS Conference 2024 Conference Paper

Distribution Guidance Network for Weakly Supervised Point Cloud Semantic Segmentation

  • Zhiyi Pan
  • Wei Gao
  • Shan Liu
  • Ge Li

Despite alleviating the dependence on dense annotations inherent to fully supervised methods, weakly supervised point cloud semantic segmentation suffers from inadequate supervision signals. In response to this challenge, we introduce a novel perspective that imparts auxiliary constraints by regulating the feature space under weak supervision. Our initial investigation identifies which distributions accurately characterize the feature space, subsequently leveraging this priori to guide the alignment of the weakly supervised embeddings. Specifically, we analyze the superiority of the mixture of von Mises-Fisher distributions (moVMF) among several common distribution candidates. Accordingly, we develop a Distribution Guidance Network (DGNet), which comprises a weakly supervised learning branch and a distribution alignment branch. Leveraging reliable clustering initialization derived from the weakly supervised learning branch, the distribution alignment branch alternately updates the parameters of the moVMF and the network, ensuring alignment with the moVMF-defined latent space. Extensive experiments validate the rationality and effectiveness of our distribution choice and network design. Consequently, DGNet achieves state-of-the-art performance under multiple datasets and various weakly supervised settings.

ECAI Conference 2024 Conference Paper

Improving Non-Autoregressive Sign Language Translation with Random Ordering Progressive Prediction Pretraining

  • Pei Yu
  • Changhao Lai
  • Cong Hu
  • Shan Liu
  • Liang Zhang
  • Biao Fu
  • Yidong Chen 0001

Recently, the Non-AutoRegressive (NAR) decoding mechanism, effectively reducing the inference latency of text generation, has been applied to Sign Language Translation (SLT). Typically, the current best NAR SLT model using a Curriculum-based Non-autoregressive Decoder (CND) outperforms AutoRegressive (AR) baselines in speed and performance. Although it has been proven that AutoRegressive Pre-trained Language Models (AR-PLMs) further boost the performance of AR SLT models, combining NAR Pretrained Language Models (NAR-PLMs) with NAR SLT model remains challenge due to (1) existing NAR-PLMs’ inability to model token dependencies between decoder layers, crucial for NAR SLT models using CND; (2) the modality gap between the decoder’s inputs of the NAR-PLMs and NAR SLT models. To address these, we propose a Random Ordering Progressive Prediction Pre-training task for NAR SLT models using CND, enabling the decoder to predict target sequences in diverse orderings and enhancing the modeling of target token dependencies between layers. Moreover, we propose a CTC-enhanced Soft Copy method to incorporate target-side information in the decoder’s inputs, alleviating the modality gap. Experimental results on PHOENIX-2014T and CSL-Daily demonstrate that our model consistently outperforms all strong baselines and achieves competitive performance with AR SLT models equipped with AR-PLMs.

AAAI Conference 2024 Conference Paper

Layer-Wise Representation Fusion for Compositional Generalization

  • Yafang Zheng
  • Lei Lin
  • Shuangtao Li
  • Yuxuan Yuan
  • Zhaohong Lai
  • Shan Liu
  • Biao Fu
  • Yidong Chen

Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the ``shallow'' residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel Layer-wise Representation Fusion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a fuse-attention module at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal. Codes are available at https://github.com/thinkaboutzero/LRF.

AAAI Conference 2024 Conference Paper

Less Is More: Label Recommendation for Weakly Supervised Point Cloud Semantic Segmentation

  • Zhiyi Pan
  • Nan Zhang
  • Wei Gao
  • Shan Liu
  • Ge Li

Weak supervision has proven to be an effective strategy for reducing the burden of annotating semantic segmentation tasks in 3D space. However, unconstrained or heuristic weakly supervised annotation forms may lead to suboptimal label efficiency. To address this issue, we propose a novel label recommendation framework for weakly supervised point cloud semantic segmentation. Distinct from pre-training and active learning, the label recommendation framework consists of three stages: inductive bias learning, recommendations for points to be labeled, and point cloud semantic segmentation learning. In practice, we first introduce the point cloud upsampling task to induct inductive bias from structural information. During the recommendation stage, we present a cross-scene clustering strategy to generate centers of clustering as recommended points. Then we introduce a recommended point positions attention module LabelAttention to model the long-range dependency under sparse annotations. Additionally, we employ position encoding to enhance the spatial awareness of semantic features. Throughout the framework, the useful information obtained from inductive bias learning is propagated to subsequent semantic segmentation networks in the form of label positions. Experimental results demonstrate that our framework outperforms weakly supervised point cloud semantic segmentation methods and other methods for labeling efficiency on S3DIS and ScanNetV2, even at an extremely low label rate.

AAAI Conference 2022 Conference Paper

OctAttention: Octree-Based Large-Scale Contexts Model for Point Cloud Compression

  • Chunyang Fu
  • Ge Li
  • Rui Song
  • Wei Gao
  • Shan Liu

In point cloud compression, sufficient contexts are significant for modeling the point cloud distribution. However, the contexts gathered by the previous voxel-based methods decrease when handling sparse point clouds. To address this problem, we propose a multiple-contexts deep learning framework called OctAttention employing the octree structure, a memory-efficient representation for point clouds. Our approach encodes octree symbol sequences in a lossless way by gathering the information of sibling and ancestor nodes. Expressly, we first represent point clouds with octree to reduce spatial redundancy, which is robust for point clouds with different resolutions. We then design a conditional entropy model with a large receptive field that models the sibling and ancestor contexts to exploit the strong dependency among the neighboring nodes and employ an attention mechanism to emphasize the correlated nodes in the context. Furthermore, we introduce a mask operation during training and testing to make a trade-off between encoding time and performance. Compared to the previous state-of-the-art works, our approach obtains a 10%-35% BD-Rate gain on the LiDAR benchmark (e. g. SemanticKITTI) and object point cloud dataset (e. g. MPEG 8i, MVUB), and saves 95% coding time compared to the voxel-based baseline. The code is available at https: //github. com/zb12138/OctAttention.

AAAI Conference 2021 Conference Paper

SSD-GAN: Measuring the Realness in the Spatial and Spectral Domains

  • Yuanqi Chen
  • Ge Li
  • Cece Jin
  • Shan Liu
  • Thomas Li

This paper observes that there is an issue of high frequencies missing in the discriminator of standard GAN, and we reveal it stems from downsampling layers employed in the network architecture. This issue makes the generator lack the incentive from the discriminator to learn high-frequency content of data, resulting in a significant spectrum discrepancy between generated images and real images. Since the Fourier transform is a bijective mapping, we argue that reducing this spectrum discrepancy would boost the performance of GANs. To this end, we introduce SSD-GAN, an enhancement of GANs to alleviate the spectral information loss in the discriminator. Specifically, we propose to embed a frequency-aware classifier into the discriminator to measure the realness of the input in both the spatial and spectral domains. With the enhanced discriminator, the generator of SSD-GAN is encouraged to learn high-frequency content of real data and generate exact details. The proposed method is general and can be easily integrated into most existing GANs framework without excessive cost. The effectiveness of SSD-GAN is validated on various network architectures, objective functions, and datasets. Code is available at https: //github. com/cyq373/SSD-GAN.

TIST Journal 2019 Journal Article

Exploiting the Value of the Center-dark Channel Prior for Salient Object Detection

  • Chunbiao Zhu
  • Wenhao Zhang
  • Thomas H. Li
  • Shan Liu
  • Ge Li

Saliency detection aims to detect the most attractive objects in images and is widely used as a foundation for various applications. In this article, we propose a novel salient object detection algorithm for RGB-D images using center-dark channel priors. First, we generate an initial saliency map based on a color saliency map and a depth saliency map of a given RGB-D image. Then, we generate a center-dark channel map based on center saliency and dark channel priors. Finally, we fuse the initial saliency map with the center dark channel map to generate the final saliency map. Extensive evaluations over four benchmark datasets demonstrate that our proposed method performs favorably against most of the state-of-the-art approaches. Besides, we further discuss the application of the proposed algorithm in small target detection and demonstrate the universal value of center-dark channel priors in the field of object detection.

AAAI Conference 2019 Conference Paper

Meta Learning for Image Captioning

  • Nannan Li
  • Zhenzhong Chen
  • Shan Liu

Reinforcement learning (RL) has shown its advantages in image captioning by optimizing the non-differentiable metric directly in the reward learning process. However, due to the reward hacking problem in RL, maximizing reward may not lead to better quality of the caption, especially from the aspects of propositional content and distinctiveness. In this work, we propose to use a new learning method, meta learning, to utilize supervision from the ground truth whilst optimizing the reward function in RL. To improve the propositional content and the distinctiveness of the generated captions, the proposed model provides the global optimal solution by taking different gradient steps towards the supervision task and the reinforcement task, simultaneously. Experimental results on MS COCO validate the effectiveness of our approach when compared with the state-of-the-art methods.

NeurIPS Conference 2019 Conference Paper

Multi-mapping Image-to-Image Translation via Learning Disentanglement

  • Xiaoming Yu
  • Yuanqi Chen
  • Shan Liu
  • Thomas Li
  • Ge Li

Recent advances of image-to-image translation focus on learning the one-to-many mapping from two aspects: multi-modal translation and multi-domain translation. However, the existing methods only consider one of the two perspectives, which makes them unable to solve each other's problem. To address this issue, we propose a novel unified model, which bridges these two objectives. First, we disentangle the input images into the latent representations by an encoder-decoder architecture with a conditional adversarial training in the feature space. Then, we encourage the generator to learn multi-mappings by a random cross-domain translation. As a result, we can manipulate different parts of the latent representations to perform multi-modal and multi-domain translations simultaneously. Experiments demonstrate that our method outperforms state-of-the-art methods.