Author name cluster

Linhang Cai

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

AAAI Conference 2022 Conference Paper

Mutual Contrastive Learning for Visual Representation Learning

Chuanguang Yang
Zhulin An
Linhang Cai
Yongjun Xu

We present a collaborative learning method called Mutual Contrastive Learning (MCL) for general visual representation learning. The core idea of MCL is to perform mutual interaction and transfer of contrastive distributions among a cohort of networks. A crucial component of MCL is Interactive Contrastive Learning (ICL). Compared with vanilla contrastive learning, ICL can aggregate cross-network embedding information and maximize the lower bound to the mutual information between two networks. This enables each network to learn extra contrastive knowledge from others, leading to better feature representations for visual recognition tasks. We emphasize that the resulting MCL is conceptually simple yet empirically powerful. It is a generic framework that can be applied to both supervised and self-supervised representation learning. Experimental results on image classification and transfer learning to object detection show that MCL can lead to consistent performance gains, demonstrating that MCL can guide the network to generate better feature representations. Code is available at https: //github. com/winycg/MCL.

PDF Details

AAAI Conference 2022 Conference Paper

Prior Gradient Mask Guided Pruning-Aware Fine-Tuning

Linhang Cai
Zhulin An
Chuanguang Yang
Yangchun Yan
Yongjun Xu

We proposed a Prior Gradient Mask Guided Pruning-Aware Fine-Tuning (PGMPF) framework to accelerate deep Convolutional Neural Networks (CNNs). In detail, the proposed PGMPF selectively suppresses the gradient of those ”unimportant” parameters via a prior gradient mask generated by the pruning criterion during fine-tuning. PGMPF has three charming characteristics over previous works: (1) Pruningaware network fine-tuning. A typical pruning pipeline consists of training, pruning and fine-tuning, which are relatively independent, while PGMPF utilizes a variant of the pruning mask as a prior gradient mask to guide fine-tuning, without complicated pruning criteria. (2) An excellent tradeoff between large model capacity during fine-tuning and stable convergence speed to obtain the final compact model. Previous works preserve more training information of pruned parameters during fine-tuning to pursue better performance, which would incur catastrophic non-convergence of the pruned model for relatively large pruning rates, while our PGMPF greatly stabilizes the fine-tuning phase by gradually constraining the learning rate of those ”unimportant” parameters. (3) Channel-wise random dropout of the prior gradient mask to impose some gradient noise to fine-tuning to further improve the robustness of final compact model. Experimental results on three image classification benchmarks CI- FAR10/100 and ILSVRC-2012 demonstrate the effectiveness of our method for various CNN architectures, datasets and pruning rates. Notably, on ILSVRC-2012, PGMPF reduces 53. 5% FLOPs on ResNet-50 with only 0. 90% top-1 accuracy drop and 0. 52% top-5 accuracy drop, which has advanced the state-of-the-art with negligible extra computational cost.

PDF Details

IJCAI Conference 2021 Conference Paper

Hierarchical Self-supervised Augmented Knowledge Distillation

Chuanguang Yang
Zhulin An
Linhang Cai
Yongjun Xu

Knowledge distillation often involves how to define and transfer knowledge from teacher to student effectively. Although recent self-supervised contrastive knowledge achieves the best performance, forcing the network to learn such knowledge may damage the representation learning of the original class recognition task. We therefore adopt an alternative self-supervised augmented task to guide the network to learn the joint distribution of the original recognition task and self-supervised auxiliary task. It is demonstrated as a richer knowledge to improve the representation power without losing the normal classification capability. Moreover, it is incomplete that previous methods only transfer the probabilistic knowledge between the final layers. We propose to append several auxiliary classifiers to hierarchical intermediate feature maps to generate diverse self-supervised knowledge and perform the one-to-one transfer to teach the student network thoroughly. Our method significantly surpasses the previous SOTA SSKD with an average improvement of 2. 56% on CIFAR-100 and an improvement of 0. 77% on ImageNet across widely used network pairs. Codes are available at https: //github. com/winycg/HSAKD.

PDF Details DOI