Author name cluster

Zhao Zhou

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

4 papers

1 author row

AAAI Conference 2025 Conference Paper

Achieving Ensemble-Like Performance in a Single Model: A Feature Diversification Framework for Image-Text Matching

Zhao Zhou
Yiqun Wang
Weizhong Zhang
Yingbin Zheng
Xiangcheng Du
Cheng Jin

Model ensembling is a widely used technique that enhances performance in image-text matching tasks by combining multiple models, each trained with different initializations. However, the inefficiencies associated with training several models and generating outputs from them constrain their practical applicability. In this paper, we argue that while the parameters of two randomly initialized models can differ significantly, their feature distributions can be similar at certain stages. By employing a proposed technique called cross-modal realignment, we demonstrate that features derived from differently initialized models maintain similarity at the feature extraction stage and can be effectively transformed by fine-tuning a small number of parameters. These findings provide an efficient way to achieve ensemble-like performance within a single model. Specifically, we propose a Feature Diversification Framework (FDF) that emulates the outputs of multiple model initializations to generate diverse features from a common shared feature. Firstly, we introduce feature conversion methods to transform shared features into a set of distinct features. Next, a realignment training strategy is presented to optimize negative pairs for realigning these transformed features, thereby enhancing their diversification to resemble the outputs of different models. Additionally, we propose a reweighting module that assigns weights to these features, enabling a weighted fusion approach for robust feature representation. Extensive experiments on the Flickr30K and MS-COCO datasets demonstrate the effectiveness and generalizability of our framework.

PDF Details DOI

AAAI Conference 2025 Conference Paper

An Exemplar-based Framework for Chinese Text Recognition

Zhao Zhou
Xiangcheng Du
Yingbin Zheng
Xingjiao Wu
Cheng Jin

This paper introduces a novel exemplar-based framework for reading Chinese texts in natural scene or document images. We present the Deep Exemplar-based Chinese Text Recognizer, which is structured to first identify candidate characters as exemplars from each text-line, and subsequently recognize them by retrieving analogous exemplars from a database. With text-line level annotations, we design the exemplar discovery network to simultaneously recognize texts and capture individual character positions in a weak-supervision manner. The exemplar retrieval module is then crafted to identify the most similar exemplar and propagate the corresponding character label. This enables us to effectively rectify the misrecognized characters and boost the performance of scene text recognition. Experiments on four scenarios of Chinese texts demonstrate the effectiveness of our proposed framework.

PDF Details DOI

AAAI Conference 2025 Conference Paper

Expanding the Scope of Negatives: Boosting Image-Text Matching with Negatives Distribution Guided Learning

Zhao Zhou
Weizhong Zhang
Xiangcheng Du
Yingbin Zheng
Cheng Jin

Image-text matching is a crucial task that bridges visual and linguistic modalities. Recent research typically formulates it into the problem of maximizing the margin with the truly hardest negatives to enhance the learning efficiency and avoid the poor local optima. We argue that such formulation can lead to a serious limitation, i.e., under this formulation, conventional trainers would confine their horizon within the hardest negative examples, while other negative examples offer a range of semantic differences not present in the hardest negatives. In this paper, we propose an efficient negative distribution guided training framework for image-text matching to unlock the substantial promotion space left by the above limitation. Rather than simply incorporating additional negative examples into the training objective, which could diminish both the leading role of the hardest negatives in training and the effect of a large margin learning in producing a robust matching model, our central idea is to supply the objective with distributional information on the entire set of negative examples. To be precise, we first construct the sample similarity matrix based on several pretrained models to extract the distributional information of the entire negative sample dataset. Then we encode it into a margin regularization module to smooth the similarities differences of all negatives. This enhancement facilitates the capture of fine-grained semantic differences and guides the main learning process by maximizing the margin with hard negative examples. Furthermore, we propose a hardest negative rectification module to address the instability in hardest negative selection based on predicted similarity and to correct erroneous hardest negatives. We evaluate our method in combination with several state-of-the-art image-text matching methods, and our quantitative and qualitative experiments demonstrate its significant generalizability and effectiveness.

PDF Details DOI

IJCAI Conference 2025 Conference Paper

Unleashing the Semantic Adaptability of Controlled Diffusion Model for Image Colorization

Xiangcheng Du
Zhao Zhou
Yanlong Wang
Yingbin Zheng
Xingjiao Wu
Peizhu Gong
Cheng Jin

Recent data-driven image colorization methods have leveraged pre-trained Text-to-Image (T2I) diffusion models as generative prior, while still suffering from unsatisfactory and inaccurate semantic-level color control. To address these issues, we propose a Semantic Adaptation method (SeAda) that enhances the prior while considering the semantic discrepancy between color and grayscale image pairs. The SeAda employs a semantic adapter to produce refined semantic embeddings and a controlled T2I diffusion model to create reasonably colored images. Specifically, the semantic adapter transfers the embedding from grayscale to color domain, while the diffusion model utilizes the refined embedding and prior knowledge to achieve realistic and diverse results. We also design a three-staged training strategy to improve semantic comprehension and prior integration for further performance improvement. Extensive experiments on public datasets demonstrate that our method outperforms existing state-of-the-art techniques, yielding superior performance in image colorization.

PDF Details DOI