Chen Loy Papers

AAAI Conference 2018 Conference Paper

Merge or Not? Learning to Group Faces via Imitation Learning

Yue He
Kaidi Cao
Cheng Li
Chen Loy

Face grouping remains a challenging problem despite the remarkable capability of deep learning approaches in learning face representation. In particular, grouping results can still be egregious given proﬁle faces and a large number of uninteresting faces and noisy detections. Often, a user needs to correct the erroneous grouping manually. In this study, we formulate a novel face grouping framework that learns clustering strategy from ground-truth simulated behavior. This is achieved through imitation learning (a. k. a apprenticeship learning or learning by watching) via inverse reinforcement learning (IRL). In contrast to existing clustering approaches that group instances by similarity, our framework makes sequential decision to dynamically decide when to merge two face instances/groups driven by short- and long-term rewards. Extensive experiments on three benchmark datasets show that our framework outperforms unsupervised and supervised baselines.

PDF Details

AAAI Conference 2018 Conference Paper

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Xiaohang Zhan
Ziwei Liu
Ping Luo
Xiaoou Tang
Chen Loy

Deep convolutional networks for semantic image segmentation typically require large-scale labeled data, e. g. , ImageNet and MS COCO, for network pre-training. To reduce annotation efforts, self-supervised semantic segmentation is recently proposed to pre-train a network without any humanprovided labels. The key of this new form of learning is to design a proxy task (e. g. , image colorization), from which a discriminative loss can be formulated on unlabeled data. Many proxy tasks, however, lack the critical supervision signals that could induce discriminative representation for the target image segmentation task. Thus self-supervision’s performance is still far from that of supervised pre-training. In this study, we overcome this limitation by incorporating a ‘mix-and-match’ (M&M) tuning stage in the self-supervision pipeline. The proposed approach is readily pluggable to many self-supervision methods and does not use more annotated samples than the original process. Yet, it is capable of boosting the performance of target image segmentation task to surpass fully-supervised pre-trained counterpart. The improvement is made possible by better harnessing the limited pixelwise annotations in the target dataset. Speciﬁcally, we ﬁrst introduce the ‘mix’ stage, which sparsely samples and mixes patches from the target set to reﬂect rich and diverse local patch statistics of target images. A ‘match’ stage then forms a class-wise connected graph, which can be used to derive a strong triplet-based discriminative loss for ﬁne-tuning the network. Our paradigm follows the standard practice in existing self-supervised studies and no extra data or label is required. With the proposed M&M approach, for the ﬁrst time, a self-supervision method can achieve comparable or even better performance compared to its ImageNet pretrained counterpart on both PASCAL VOC2012 dataset and CityScapes dataset.

PDF Details

AAAI Conference 2016 Conference Paper

Reading Scene Text in Deep Convolutional Sequences

Pan He
Weilin Huang
Yu Qiao
Chen Loy
Xiaoou Tang

PDF Details

Possible papers

Merge or Not? Learning to Group Faces via Imitation Learning

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Reading Scene Text in Deep Convolutional Sequences