Arrow Research search

Author name cluster

Shiming Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

7 papers
2 author rows

Possible papers

7

AAAI Conference 2025 Conference Paper

FSL-Rectifier: Rectify Outliers in Few-Shot Learning via Test-Time Augmentation

  • Yunwei Bai
  • Ying Kiat Tan
  • Shiming Chen
  • Yao Shu
  • Tsuhan Chen

Few-shot learning (FSL) commonly requires a model to identify images (queries) that belong to classes unseen during training, based on a few labelled samples of the new classes (support set) as reference. So far, plenty of algorithms involve training data augmentation to improve the generalization capability of FSL models, but outlier queries or support images during inference can still pose great generalization challenges. In this work, to reduce the bias caused by the outlier samples, we generate additional test-class samples by combining original samples with suitable train-class samples via a generative image combiner. Then, we obtain averaged features via an augmentor, which leads to more typical representations through the averaging. We experimentally and theoretically demonstrate the effectiveness of our method, obtaining a test accuracy improvement proportion of around 10% (e.g., from 46.86% to 53.28%) for trained FSL models. Importantly, given a pretrained image combiner, our method is training-free for off-the-shelf FSL models, whose performance can be improved without extra datasets nor further training of the models themselves.

AAAI Conference 2025 Conference Paper

ZeroMamba: Exploring Visual State Space Model for Zero-Shot Learning

  • Wenjin Hou
  • Dingjie Fu
  • Kun Li
  • Shiming Chen
  • Hehe Fan
  • Yi Yang

Zero-shot learning (ZSL) aims to recognize unseen classes by transferring semantic knowledge from seen classes to unseen ones, guided by semantic information. To this end, existing works have demonstrated remarkable performance by utilizing global visual features from Convolutional Neural Networks (CNNs) or Vision Transformers (ViTs) for visual-semantic interactions. Due to the limited receptive fields of CNNs and the quadratic complexity of ViTs, however, these visual backbones achieve suboptimal visual-semantic interactions. In this paper, motivated by the visual state space model (i.e., Vision Mamba), which is capable of capturing long-range dependencies and modeling complex visual dynamics, we propose a parameter-efficient ZSL framework called ZeroMamba to advance ZSL. Our ZeroMamba comprises three key components: Semantic-aware Local Projection (SLP), Global Representation Learning (GRL), and Semantic Fusion (SeF). Specifically, SLP integrates semantic embeddings to map visual features to local semantic-related representations, while GRL encourages the model to learn global semantic representations. SeF combines these two semantic representations to enhance the discriminability of semantic features. We incorporate these designs into Vision Mamba, forming an end-to-end ZSL framework. As a result, the learned semantic representations are better suited for classification. Through extensive experiments on four prominent ZSL benchmarks, ZeroMamba demonstrates superior performance, significantly outperforming the state-of-the-art (i.e., CNN-based and ViT-based) methods under both conventional ZSL (CZSL) and generalized ZSL (GZSL) settings.

IJCAI Conference 2022 Conference Paper

Semantic Compression Embedding for Generative Zero-Shot Learning

  • Ziming Hong
  • Shiming Chen
  • Guo-Sen Xie
  • Wenhan Yang
  • Jian Zhao
  • Yuanjie Shao
  • Qinmu Peng
  • Xinge You

Generative methods have been successfully applied in zero-shot learning (ZSL) by learning an implicit mapping to alleviate the visual-semantic domain gaps and synthesizing unseen samples to handle the data imbalance between seen and unseen classes. However, existing generative methods simply use visual features extracted by the pre-trained CNN backbone. These visual features lack attribute-level semantic information. Consequently, seen classes are indistinguishable, and the knowledge transfer from seen to unseen classes is limited. To tackle this issue, we propose a novel Semantic Compression Embedding Guided Generation (SC-EGG) model, which cascades a semantic compression embedding network (SCEN) and an embedding guided generative network (EGGN). The SCEN extracts a group of attribute-level local features for each sample and further compresses them into the new low-dimension visual feature. Thus, a dense-semantic visual space is obtained. The EGGN learns a mapping from the class-level semantic space to the dense-semantic visual space, thus improving the discriminability of the synthesized dense-semantic unseen visual features. Extensive experiments on three benchmark datasets, i. e. , CUB, SUN and AWA2, demonstrate the significant performance gains of SC-EGG over current state-of-the-art methods and its baselines.

AAAI Conference 2022 Conference Paper

TransZero: Attribute-Guided Transformer for Zero-Shot Learning

  • Shiming Chen
  • Ziming Hong
  • Yang Liu
  • Guo-Sen Xie
  • Baigui Sun
  • Hao Li
  • Qinmu Peng
  • Ke Lu

Zero-shot learning (ZSL) aims to recognize novel classes by transferring semantic knowledge from seen classes to unseen ones. Semantic knowledge is learned from attribute descriptions shared between different classes, which act as strong priors for localizing object attributes that represent discriminative region features, enabling significant visual-semantic interaction. Although some attention-based models have attempted to learn such region features in a single image, the transferability and discriminative attribute localization of visual features are typically neglected. In this paper, we propose an attribute-guided Transformer network, termed TransZero, to refine visual features and learn attribute localization for discriminative visual embedding representations in ZSL. Specifically, TransZero takes a feature augmentation encoder to alleviate the cross-dataset bias between ImageNet and ZSL benchmarks, and improves the transferability of visual features by reducing the entangled relative geometry relationships among region features. To learn locality-augmented visual features, TransZero employs a visual-semantic decoder to localize the image regions most relevant to each attribute in a given image, under the guidance of semantic attribute information. Then, the locality-augmented visual features and semantic vectors are used to conduct effective visual-semantic interaction in a visual-semantic embedding network. Extensive experiments show that TransZero achieves the new state of the art on three ZSL benchmarks. The codes are available at: https: //github. com/shiming-chen/TransZero.

NeurIPS Conference 2021 Conference Paper

HSVA: Hierarchical Semantic-Visual Adaptation for Zero-Shot Learning

  • Shiming Chen
  • Guosen Xie
  • Yang Liu
  • Qinmu Peng
  • Baigui Sun
  • Hao Li
  • Xinge You
  • Ling Shao

Zero-shot learning (ZSL) tackles the unseen class recognition problem, transferring semantic knowledge from seen classes to unseen ones. Typically, to guarantee desirable knowledge transfer, a common (latent) space is adopted for associating the visual and semantic domains in ZSL. However, existing common space learning methods align the semantic and visual domains by merely mitigating distribution disagreement through one-step adaptation. This strategy is usually ineffective due to the heterogeneous nature of the feature representations in the two domains, which intrinsically contain both distribution and structure variations. To address this and advance ZSL, we propose a novel hierarchical semantic-visual adaptation (HSVA) framework. Specifically, HSVA aligns the semantic and visual domains by adopting a hierarchical two-step adaptation, i. e. , structure adaptation and distribution adaptation. In the structure adaptation step, we take two task-specific encoders to encode the source data (visual domain) and the target data (semantic domain) into a structure-aligned common space. To this end, a supervised adversarial discrepancy (SAD) module is proposed to adversarially minimize the discrepancy between the predictions of two task-specific classifiers, thus making the visual and semantic feature manifolds more closely aligned. In the distribution adaptation step, we directly minimize the Wasserstein distance between the latent multivariate Gaussian distributions to align the visual and semantic distributions using a common encoder. Finally, the structure and distribution adaptation are derived in a unified framework under two partially-aligned variational autoencoders. Extensive experiments on four benchmark datasets demonstrate that HSVA achieves superior performance on both conventional and generalized ZSL. The code is available at \url{https: //github. com/shiming-chen/HSVA}.

IJCAI Conference 2021 Conference Paper

Norm-guided Adaptive Visual Embedding for Zero-Shot Sketch-Based Image Retrieval

  • Wenjie Wang
  • Yufeng Shi
  • Shiming Chen
  • Qinmu Peng
  • Feng Zheng
  • Xinge You

Zero-shot sketch-based image retrieval (ZS-SBIR), which aims to retrieve photos with sketches under the zero-shot scenario, has shown extraordinary talents in real-world applications. Most existing methods leverage language models to generate class-prototypes and use them to arrange the locations of all categories in the common space for photos and sketches. Although great progress has been made, few of them consider whether such pre-defined prototypes are necessary for ZS-SBIR, where locations of unseen class samples in the embedding space are actually determined by visual appearance and a visual embedding actually performs better. To this end, we propose a novel Norm-guided Adaptive Visual Embedding (NAVE) model, for adaptively building the common space based on visual similarity instead of language-based pre-defined prototypes. To further enhance the representation quality of unseen classes for both photo and sketch modality, modality norm discrepancy and noisy label regularizer are jointly employed to measure and repair the modality bias of the learned common embedding. Experiments on two challenging datasets demonstrate the superiority of our NAVE over state-of-the-art competitors.

ICRA Conference 2016 Conference Paper

Speed evaluation of a freely swimming robotic fish with an artificial lateral line

  • Wei Wang 0078
  • Yuan Li
  • Xingxing Zhang
  • Chen Wang 0005
  • Shiming Chen
  • Guangming Xie

Artificial lateral line has been drawing an increasing attention recently for its potential applications in robotics. Experiments are usually conducted with a bioinspired robot in a controlled environment, where the sensing platform is held stationary or slowly driven with a simple linear motion. In this paper, we conduct a more practical and challenging study where the robot uses artificial lateral line to evaluate its linear velocity while freely swimming. We use onboard artificial lateral line to measure the pressure profiles over the surface of a robotic fish and employ onboard IMU (inertial measurement unit) to record the motion kinematics of the robot while freely swimming at various speeds. We find that 1) pressure changes are greatest on the head of the robot; 2) pressures increase along with the swimming speed and the oscillation amplitude of angular velocity of the robot. Therefore, we propose a nonlinear prediction model which incorporates distributed pressure and angular velocity to estimate the speed of the robot. Online speed evaluation experiment demonstrates the effectiveness and the accuracy of the proposed model.