Collective Deep Quantization for Efficient Cross-Modal Retrieval

Yue Cao; Mingsheng Long; Jianmin Wang; Shichen Liu

Back to AAAI

AAAI 2017

Collective Deep Quantization for Efficient Cross-Modal Retrieval

Conference Paper AAAI Technical Track: Vision Artificial Intelligence

PDF Details

Abstract

Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e. g. , using an image to retrieve for texts. This paper presents a compact coding solution for efﬁcient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal similarity retrieval. We propose a collective deep quantization (CDQ) approach, which is the ﬁrst attempt to introduce quantization in end-to-end deep architecture for cross-modal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefully-crafted hybrid networks and well-speciﬁed loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the crossmodal correlation can be substantially enhanced. CDQ enables efﬁcient and effective cross-modal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.

Collective Deep Quantization for Efficient Cross-Modal Retrieval

Abstract

Authors

Keywords

Context