Author name cluster

Tianshui Chen

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

10 papers

2 author rows

IJCAI Conference 2025 Conference Paper

ReplayCAD: Generative Diffusion Replay for Continual Anomaly Detection

Lei Hu
Zhiyong Gan
Ling Deng
Jinglin Liang
Lingyu Liang
Shuangping Huang
Tianshui Chen

Continual Anomaly Detection (CAD) enables anomaly detection models in learning new classes while preserving knowledge of historical classes. CAD faces two key challenges: catastrophic forgetting and segmentation of small anomalous regions. Existing CAD methods store image distributions or patch features to mitigate catastrophic forgetting, but they fail to preserve pixel-level detailed features for accurate segmentation. To overcome this limitation, we propose ReplayCAD, a novel diffusion-driven generative replay framework that replay high-quality historical data, thus effectively preserving pixel-level detailed features. Specifically, we compress historical data by searching for a class semantic embedding in the conditional space of the pre-trained diffusion model, which can guide the model to replay data with fine-grained pixel details, thus improving the segmentation performance. However, relying solely on semantic features results in limited spatial diversity. Hence, we further use spatial features to guide data compression, achieving precise control of sample space, thereby generating more diverse data. Our method achieves state-of-the-art performance in both classification and segmentation, with notable improvements in segmentation: 11. 5% on VisA and 8. 1% on MVTec. Our source code is available at https: //github. com/HULEI7/ReplayCAD.

PDF Details DOI

AAAI Conference 2024 Conference Paper

Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation

Hui Fu
Zeqing Wang
Ke Gong
Keze Wang
Tianshui Chen
Haojie Li
Haifeng Zeng
Wenxiong Kang

Speech-driven 3D facial animation aims to synthesize vivid facial animations that accurately synchronize with speech and match the unique speaking style. However, existing works primarily focus on achieving precise lip synchronization while neglecting to model the subject-specific speaking style, often resulting in unrealistic facial animations. To the best of our knowledge, this work makes the first attempt to explore the coupled information between the speaking style and the semantic content in facial motions. Specifically, we introduce an innovative speaking style disentanglement method, which enables arbitrary-subject speaking style encoding and leads to a more realistic synthesis of speech-driven facial animations. Subsequently, we propose a novel framework called Mimic to learn disentangled representations of the speaking style and content from facial motions by building two latent spaces for style and content, respectively. Moreover, to facilitate disentangled representation learning, we introduce four well-designed constraints: an auxiliary style classifier, an auxiliary inverse classifier, a content contrastive loss, and a pair of latent cycle losses, which can effectively contribute to the construction of the identity-related style space and semantic-related content space. Extensive qualitative and quantitative experiments conducted on three publicly available datasets demonstrate that our approach outperforms state-of-the-art methods and is capable of capturing diverse speaking styles for speech-driven 3D facial animation. The source code and supplementary video are publicly available at: https://zeqing-wang.github.io/Mimic/

PDF Details DOI

AAAI Conference 2022 Conference Paper

Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Tao Pu
Tianshui Chen
Hefeng Wu
Liang Lin

Training the multi-label image recognition models with partial labels, in which merely some labels are known while others are unknown for each image, is a considerably challenging and practical task. To address this task, current algorithms mainly depend on pre-training classification or similarity models to generate pseudo labels for the unknown labels. However, these algorithms depend on sufficient multilabel annotations to train the models, leading to poor performance especially with low known label proportion. In this work, we propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels, which can get rid of pre-training models and thus does not depend on sufficient annotations. To this end, we design a unified semanticaware representation blending (SARB) framework that exploits instance-level and prototype-level semantic representation to complement unknown labels by two complementary modules: 1) an instance-level representation blending (ILRB) module blends the representations of the known labels in an image to the representations of the unknown labels in another image to complement these unknown labels. 2) a prototypelevel representation blending (PLRB) module learns more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels to complement these labels. Extensive experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors on all known label proportion settings, i. e. , with the mAP improvement of 4. 6%, 4. 6%, 2. 2% on these three datasets when the known label proportion is 10%. Codes are available at https: //github. com/HCPLab-SYSU/HCP-MLR-PL.

PDF Details

AAAI Conference 2022 Conference Paper

Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

Tianshui Chen
Tao Pu
Hefeng Wu
Yuan Xie
Liang Lin

Multi-label image recognition is a fundamental yet practical task because real-world images inherently possess multiple semantic labels. However, it is difficult to collect large-scale multi-label annotations due to the complexity of both the input images and output label spaces. To reduce the annotation cost, we propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels, i. e. , merely some labels are known while other labels are missing (also called unknown labels) per image. The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations to transfer knowledge of known labels to generate pseudo labels for unknown labels. Specifically, an intraimage semantic transfer module learns image-specific label co-occurrence matrix and maps the known labels to complement unknown labels based on this matrix. Meanwhile, a cross-image transfer module learns category-specific feature similarities and helps complement unknown labels with high similarities. Finally, both known and generated labels are used to train the multi-label recognition models. Extensive experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms. Codes are available at https: //github. com/HCPLab- SYSU/HCP-MLR-PL.

PDF Details

ICRA Conference 2021 Conference Paper

AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

Tao Pu 0002
Tianshui Chen
Yuan Xie 0004
Hefeng Wu
Liang Lin

Recognizing human emotion/expressions automatically is quite an expected ability for intelligent robotics, as it can promote better communication and cooperation with humans. Current deep-learning-based algorithms may achieve impressive performance in some lab-controlled environments, but they always fail to recognize the expressions accurately for the uncontrolled in-the-wild situation. Fortunately, facial action units (AU) describe subtle facial behaviors, and they can help distinguish uncertain and ambiguous expressions. In this work, we explore the correlations among the action units and facial expressions, and devise an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition. Specifically, it leverages AU-expression correlations to guide the learning of the AU classifiers, and thus it can obtain AU representations without incurring any AU annotations. Then, it introduces a knowledge-guided attention mechanism that mines useful AU representations under the constraint of AU-expression correlations. In this way, the framework can capture local discriminative and complementary features to enhance facial representation for facial expression recognition. We conduct experiments on the challenging uncontrolled datasets to demonstrate the superiority of the proposed framework over current state-of-the-art methods. Codes and trained models are available at https://github.com/HCPLab-SYSU/AUE-CRL.

Details

AAAI Conference 2020 Conference Paper

Knowledge Graph Transfer Network for Few-Shot Recognition

Riquan Chen
Tianshui Chen
Xiaolu Hui
Hefeng Wu
Guanbin Li
Liang Lin

Few-shot learning aims to learn novel categories from very few samples given some base categories with sufﬁcient training samples. The main challenge of this task is the novel categories are prone to dominated by color, texture, shape of the object or background context (namely speciﬁcity), which are distinct for the given few training samples but not common for the corresponding categories (see Figure 1). Fortunately, we ﬁnd that transferring information of the correlated based categories can help learn the novel concepts and thus avoid the novel concept being dominated by the speciﬁcity. Besides, incorporating semantic correlations among different categories can effectively regularize this information transfer. In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN). Specifically, by initializing each node with the classiﬁer weight of the corresponding category, a propagation mechanism is learned to adaptively propagate node message through the graph to explore node interaction and transfer classiﬁer information of the base categories to those of the novel ones. Extensive experiments on the ImageNet dataset show significant performance improvement compared with current leading competitors. Furthermore, we construct an ImageNet-6K dataset that covers larger scale categories, i. e, 6, 000 categories, and experiments on this dataset further demonstrate the effectiveness of our proposed model.

PDF Details

IJCAI Conference 2018 Conference Paper

Deep Reasoning with Knowledge Graph for Social Relationship Understanding

Zhouxia Wang
Tianshui Chen
Jimmy Ren
Weihao Yu
Hui Cheng
Liang Lin

Social relationships (e. g. , friends, couple etc. ) form the basis of the social network in our daily life. Automatically interpreting such relationships bears a great potential for the intelligent systems to understand human behavior in depth and to better interact with people at a social level. Human beings interpret the social relationships within a group not only based on the people alone, and the interplay between such social relationships and the contextual information around the people also plays a significant role. However, these additional cues are largely overlooked by the previous studies. We found that the interplay between these two factors can be effectively modeled by a novel structured knowledge graph with proper message propagation and attention. And this structured knowledge can be efficiently integrated into the deep neural network architecture to promote social relationship understanding by an end-to-end trainable Graph Reasoning Model (GRM), in which a propagation mechanism is learned to propagate node message through the graph to explore the interaction between persons of interest and the contextual objects. Meanwhile, a graph attentional mechanism is introduced to explicitly reason about the discriminative objects to promote recognition. Extensive experiments on the public benchmarks demonstrate the superiority of our method over the existing leading competitors.

PDF Details

IJCAI Conference 2018 Conference Paper

Knowledge-Embedded Representation Learning for Fine-Grained Image Recognition

Tianshui Chen
Liang Lin
Riquan Chen
Yang Wu
Xiaonan Luo

Humans can naturally understand an image in depth with the aid of rich knowledge accumulated from daily lives or professions. For example, to achieve fine-grained image recognition (e. g. , categorizing hundreds of subordinate categories of birds) usually requires a comprehensive visual concept organization including category labels and part-level attributes. In this work, we investigate how to unify rich professional knowledge with deep neural network architectures and propose a Knowledge-Embedded Representation Learning (KERL) framework for handling the problem of fine-grained image recognition. Specifically, we organize the rich visual concepts in the form of knowledge graph and employ a Gated Graph Neural Network to propagate node message through the graph for generating the knowledge representation. By introducing a novel gated mechanism, our KERL framework incorporates this knowledge representation into the discriminative image feature learning, i. e. , implicitly associating the specific attributes with the feature maps. Compared with existing methods of fine-grained image classification, our KERL framework has several appealing properties: i) The embedded high-level knowledge enhances the feature representation, thus facilitating distinguishing the subtle differences among subordinate categories. ii) Our framework can learn feature maps with a meaningful configuration that the highlighted regions finely accord with the nodes (specific attributes) of the knowledge graph. Extensive experiments on the widely used Caltech-UCSD bird dataset demonstrate the superiority of our KERL framework over existing state-of-the-art methods.

PDF Details

AAAI Conference 2018 Conference Paper

Learning a Wavelet-Like Auto-Encoder to Accelerate Deep Neural Networks

Tianshui Chen
Liang Lin
Wangmeng Zuo
Xiaonan Luo
Lei Zhang

Accelerating deep neural networks (DNNs) has been attracting increasing attention as it can beneﬁt a wide range of applications, e. g. , enabling mobile systems with limited computing resources to own powerful visual recognition ability. A practical strategy to this goal usually relies on a two-stage process: operating on the trained DNNs (e. g. , approximating the convolutional ﬁlters with tensor decomposition) and ﬁnetuning the amended network, leading to difﬁculty in balancing the trade-off between acceleration and maintaining recognition performance. In this work, aiming at a general and comprehensive way for neural network acceleration, we develop a Wavelet-like Auto-Encoder (WAE) that decomposes the original input image into two low-resolution channels (sub-images) and incorporate the WAE into the classiﬁcation neural networks for joint training. The two decomposed channels, in particular, are encoded to carry the low-frequency information (e. g. , image proﬁles) and high-frequency (e. g. , image details or noises), respectively, and enable reconstructing the original input image through the decoding process. Then, we feed the low-frequency channel into a standard classiﬁcation network such as VGG or ResNet and employ a very lightweight network to fuse with the high-frequency channel to obtain the classiﬁcation result. Compared to existing DNN acceleration solutions, our framework has the following advantages: i) it is tolerant to any existing convolutional neural networks for classiﬁcation without amending their structures; ii) the WAE provides an interpretable way to preserve the main components of the input image for classiﬁcation.

PDF Details

AAAI Conference 2018 Conference Paper

Recurrent Attentional Reinforcement Learning for Multi-Label Image Recognition

Tianshui Chen
Zhouxia Wang
Guanbin Li
Liang Lin

Recognizing multiple labels of images is a fundamental but challenging task in computer vision, and remarkable progress has been attained by localizing semantic-aware image regions and predicting their labels with deep convolutional neural networks. The step of hypothesis regions (region proposals) localization in these existing multi-label image recognition pipelines, however, usually takes redundant computation cost, e. g. , generating hundreds of meaningless proposals with nondiscriminative information and extracting their features, and the spatial contextual dependency modeling among the localized regions are often ignored or over-simpliﬁed. To resolve these issues, this paper proposes a recurrent attention reinforcement learning framework to iteratively discover a sequence of attentional and informative regions that are related to different semantic objects and further predict label scores conditioned on these regions. Besides, our method explicitly models longterm dependencies among these attentional regions that help to capture semantic label co-occurrence and thus facilitate multilabel recognition. Extensive experiments and comparisons on two large-scale benchmarks (i. e. , PASCAL VOC and MS- COCO) show that our model achieves superior performance over existing state-of-the-art methods in both performance and efﬁciency as well as explicitly identifying image-level semantic labels to speciﬁc object regions.

PDF Details