Author name cluster

Chunshui Cao

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

NeurIPS Conference 2025 Conference Paper

Vocabulary-Guided Gait Recognition

Panjian Huang
Saihui Hou
Chunshui Cao
Xu Liu
Yongzhen Huang

What is a gait? Appearance-based gait networks consider a gait as the human shape and motion information from images. Model-based gait networks treat a gait as the human inherent structure from points. However, the considerations remain vague for humans to comprehend truly. In this work, we introduce a novel paradigm Vocabulary-Guided Gait Recognition, dubbed Gait-World, which attempts to explore gait concepts through human vocabularies with Vision-Language Models (VLMs). Despite VLMs have achieved the remarkable progress in various vision tasks, the cognitive capability regarding gait modalities remains limited. The success element in Gait-World is the proper vocabulary prompt where this paradigm carefully selects gait cycle actions as Vocabulary Base, bridging the gait and vocabulary feature spaces and further promoting human understanding for the gait. How to extract gait features? Although previous gait networks have made significant progress, learning solely from gait modalities on limited gait databases makes it difficult to learn robust gait features for practicality. Therefore, we propose the first Gait-World model, dubbed $\alpha$-Gait, which guides the gait network learning with universal vocabulary knowledge from VLMs. However, due to the heterogeneity of the modalities, directly integrating vocabulary and gait features is highly challenging as they reside in different embedding spaces. To address the issues, $\alpha$-Gait designs Vocabulary Relation Mapper and Gait Fine-grained Detector to map and establish vocabulary relations in the gait space for detecting corresponding gait features. Extensive experiments on CASIA-B, CCPG, SUSTech1K, Gait3D and GREW reveal the potential value and research directions of vocabulary information from VLMs in the gait field.

PDF Details

AAAI Conference 2024 Conference Paper

QAGait: Revisit Gait Recognition from a Quality Perspective

Zengbin Wang
Saihui Hou
Man Zhang
Xu Liu
Chunshui Cao
Yongzhen Huang
Peipei Li
Shibiao Xu

Gait recognition is a promising biometric method that aims to identify pedestrians from their unique walking patterns. Silhouette modality, renowned for its easy acquisition, simple structure, sparse representation, and convenient modeling, has been widely employed in controlled in-the-lab research. However, as gait recognition rapidly advances from in-the-lab to in-the-wild scenarios, various conditions raise significant challenges for silhouette modality, including 1) unidentifiable low-quality silhouettes (abnormal segmentation, severe occlusion, or even non-human shape), and 2) identifiable but challenging silhouettes (background noise, non-standard posture, slight occlusion). To address these challenges, we revisit gait recognition pipeline and approach gait recognition from a quality perspective, namely QAGait. Specifically, we propose a series of cost-effective quality assessment strategies, including Maxmial Connect Area and Template Match to eliminate background noises and unidentifiable silhouettes, Alignment strategy to handle non-standard postures. We also propose two quality-aware loss functions to integrate silhouette quality into optimization within the embedding space. Extensive experiments demonstrate our QAGait can guarantee both gait reliability and performance enhancement. Furthermore, our quality assessment strategies can seamlessly integrate with existing gait datasets, showcasing our superiority. Code is available at https://github.com/wzb-bupt/QAGait.

PDF Details DOI

NeurIPS Conference 2023 Conference Paper

Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification

Rui Wang
Peipei Li
Huaibo Huang
Chunshui Cao
Ran He
Zhaofeng He

We present a novel language-driven ordering alignment method for ordinal classification. The labels in ordinal classification contain additional ordering relations, making them prone to overfitting when relying solely on training data. Recent developments in pre-trained vision-language models inspire us to leverage the rich ordinal priors in human language by converting the original task into a vision-language alignment task. Consequently, we propose L2RCLIP, which fully utilizes the language priors from two perspectives. First, we introduce a complementary prompt tuning technique called RankFormer, designed to enhance the ordering relation of original rank prompts. It employs token-level attention with residual-style prompt blending in the word embedding space. Second, to further incorporate language priors, we revisit the approximate bound optimization of vanilla cross-entropy loss and restructure it within the cross-modal embedding space. Consequently, we propose a cross-modal ordinal pairwise loss to refine the CLIP feature space, where texts and images maintain both semantic alignment and ordering alignment. Extensive experiments on three ordinal classification tasks, including facial age estimation, historical color image (HCI) classification, and aesthetic assessment demonstrate its promising performance.

PDF Details

AAAI Conference 2018 Conference Paper

Lateral Inhibition-Inspired Convolutional Neural Network for Visual Attention and Saliency Detection

Chunshui Cao
Yongzhen Huang
Zilei Wang
Liang Wang
Ninglong Xu
Tieniu Tan

Lateral inhibition in top-down feedback is widely existing in visual neurobiology, but such an important mechanism has not be well explored yet in computer vision. In our recent research, we ﬁnd that modeling lateral inhibition in convolutional neural network (LICNN) is very useful for visual attention and saliency detection. In this paper, we propose to formulate lateral inhibition inspired by the related studies from neurobiology, and embed it into the top-down gradient computation of a general CNN for classiﬁcation, i. e. only category-level information is used. After this operation (only conducted once), the network has the ability to generate accurate category-speciﬁc attention maps. Further, we apply LICNN for weakly-supervised salient object detection. Extensive experimental studies on a set of databases, e. g. , EC- SSD, HKU-IS, PASCAL-S and DUT-OMRON, demonstrate the great advantage of LICNN which achieves the state-ofthe-art performance. It is especially impressive that LICNN with only category-level supervised information even outperforms some recent methods with segmentation-level supervised learning.

PDF Details