Author name cluster

Hefeng Wu

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

5 papers

2 author rows

NeurIPS Conference 2025 Conference Paper

Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention

Haijing Liu
Zhiyuan Song
Hefeng Wu
Tao Pu
Keze Wang
Liang Lin

Egocentric Referring Video Object Segmentation (Ego-RVOS) aims to segment the specific object actively involved in a human action, as described by a language query, within first-person videos. This task is critical for understanding egocentric human behavior. However, achieving such segmentation robustly is challenging due to ambiguities inherent in egocentric videos and biases present in training data. Consequently, existing methods often struggle, learning spurious correlations from skewed object-action pairings in datasets and fundamental visual confounding factors of the egocentric perspective, such as rapid motion and frequent occlusions. To address these limitations, we introduce Causal Ego-REferring Segmentation (CERES), a plug-in causal framework that adapts strong, pre-trained RVOS backbones to the egocentric domain. CERES implements dual-modal causal intervention: applying backdoor adjustment principles to counteract language representation biases learned from dataset statistics, and leveraging front-door adjustment concepts to address visual confounding by intelligently integrating semantic visual features with geometric depth information guided by causal principles, creating representations more robust to egocentric distortions. Extensive experiments demonstrate that CERES achieves state-of-the-art performance on Ego-RVOS benchmarks, highlighting the potential of applying causal reasoning to build more reliable models for broader egocentric video understanding.

PDF Details

AAAI Conference 2022 Conference Paper

Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels

Tao Pu
Tianshui Chen
Hefeng Wu
Liang Lin

Training the multi-label image recognition models with partial labels, in which merely some labels are known while others are unknown for each image, is a considerably challenging and practical task. To address this task, current algorithms mainly depend on pre-training classification or similarity models to generate pseudo labels for the unknown labels. However, these algorithms depend on sufficient multilabel annotations to train the models, leading to poor performance especially with low known label proportion. In this work, we propose to blend category-specific representation across different images to transfer information of known labels to complement unknown labels, which can get rid of pre-training models and thus does not depend on sufficient annotations. To this end, we design a unified semanticaware representation blending (SARB) framework that exploits instance-level and prototype-level semantic representation to complement unknown labels by two complementary modules: 1) an instance-level representation blending (ILRB) module blends the representations of the known labels in an image to the representations of the unknown labels in another image to complement these unknown labels. 2) a prototypelevel representation blending (PLRB) module learns more stable representation prototypes for each category and blends the representation of unknown labels with the prototypes of corresponding labels to complement these labels. Extensive experiments on the MS-COCO, Visual Genome, Pascal VOC 2007 datasets show that the proposed SARB framework obtains superior performance over current leading competitors on all known label proportion settings, i. e. , with the mAP improvement of 4. 6%, 4. 6%, 2. 2% on these three datasets when the known label proportion is 10%. Codes are available at https: //github. com/HCPLab-SYSU/HCP-MLR-PL.

PDF Details

AAAI Conference 2022 Conference Paper

Structured Semantic Transfer for Multi-Label Recognition with Partial Labels

Tianshui Chen
Tao Pu
Hefeng Wu
Yuan Xie
Liang Lin

Multi-label image recognition is a fundamental yet practical task because real-world images inherently possess multiple semantic labels. However, it is difficult to collect large-scale multi-label annotations due to the complexity of both the input images and output label spaces. To reduce the annotation cost, we propose a structured semantic transfer (SST) framework that enables training multi-label recognition models with partial labels, i. e. , merely some labels are known while other labels are missing (also called unknown labels) per image. The framework consists of two complementary transfer modules that explore within-image and cross-image semantic correlations to transfer knowledge of known labels to generate pseudo labels for unknown labels. Specifically, an intraimage semantic transfer module learns image-specific label co-occurrence matrix and maps the known labels to complement unknown labels based on this matrix. Meanwhile, a cross-image transfer module learns category-specific feature similarities and helps complement unknown labels with high similarities. Finally, both known and generated labels are used to train the multi-label recognition models. Extensive experiments on the Microsoft COCO, Visual Genome and Pascal VOC datasets show that the proposed SST framework obtains superior performance over current state-of-the-art algorithms. Codes are available at https: //github. com/HCPLab- SYSU/HCP-MLR-PL.

PDF Details

ICRA Conference 2021 Conference Paper

AU-Expression Knowledge Constrained Representation Learning for Facial Expression Recognition

Tao Pu 0002
Tianshui Chen
Yuan Xie 0004
Hefeng Wu
Liang Lin

Recognizing human emotion/expressions automatically is quite an expected ability for intelligent robotics, as it can promote better communication and cooperation with humans. Current deep-learning-based algorithms may achieve impressive performance in some lab-controlled environments, but they always fail to recognize the expressions accurately for the uncontrolled in-the-wild situation. Fortunately, facial action units (AU) describe subtle facial behaviors, and they can help distinguish uncertain and ambiguous expressions. In this work, we explore the correlations among the action units and facial expressions, and devise an AU-Expression Knowledge Constrained Representation Learning (AUE-CRL) framework to learn the AU representations without AU annotations and adaptively use representations to facilitate facial expression recognition. Specifically, it leverages AU-expression correlations to guide the learning of the AU classifiers, and thus it can obtain AU representations without incurring any AU annotations. Then, it introduces a knowledge-guided attention mechanism that mines useful AU representations under the constraint of AU-expression correlations. In this way, the framework can capture local discriminative and complementary features to enhance facial representation for facial expression recognition. We conduct experiments on the challenging uncontrolled datasets to demonstrate the superiority of the proposed framework over current state-of-the-art methods. Codes and trained models are available at https://github.com/HCPLab-SYSU/AUE-CRL.

Details

AAAI Conference 2020 Conference Paper

Knowledge Graph Transfer Network for Few-Shot Recognition

Riquan Chen
Tianshui Chen
Xiaolu Hui
Hefeng Wu
Guanbin Li
Liang Lin

Few-shot learning aims to learn novel categories from very few samples given some base categories with sufﬁcient training samples. The main challenge of this task is the novel categories are prone to dominated by color, texture, shape of the object or background context (namely speciﬁcity), which are distinct for the given few training samples but not common for the corresponding categories (see Figure 1). Fortunately, we ﬁnd that transferring information of the correlated based categories can help learn the novel concepts and thus avoid the novel concept being dominated by the speciﬁcity. Besides, incorporating semantic correlations among different categories can effectively regularize this information transfer. In this work, we represent the semantic correlations in the form of structured knowledge graph and integrate this graph into deep neural networks to promote few-shot learning by a novel Knowledge Graph Transfer Network (KGTN). Specifically, by initializing each node with the classiﬁer weight of the corresponding category, a propagation mechanism is learned to adaptively propagate node message through the graph to explore node interaction and transfer classiﬁer information of the base categories to those of the novel ones. Extensive experiments on the ImageNet dataset show significant performance improvement compared with current leading competitors. Furthermore, we construct an ImageNet-6K dataset that covers larger scale categories, i. e, 6, 000 categories, and experiments on this dataset further demonstrate the effectiveness of our proposed model.

PDF Details