Author name cluster

Zhaoyi Wan

Possible papers associated with this exact author name in Arrow. This page groups case-insensitive exact name matches and is not a full identity disambiguation profile.

3 papers

1 author row

AAAI Conference 2020 Conference Paper

Real-Time Scene Text Detection with Differentiable Binarization

Minghui Liao
Zhaoyi Wan
Cong Yao
Kai Chen
Xiang Bai

Recently, segmentation-based methods are quite popular in scene text detection, as the segmentation results can more accurately describe scene text of various shapes such as curve text. However, the post-processing of binarization is essential for segmentation-based detection, which converts probability maps produced by a segmentation method into bounding boxes/regions of text. In this paper, we propose a module named Differentiable Binarization (DB), which can perform the binarization process in a segmentation network. Optimized along with a DB module, a segmentation network can adaptively set the thresholds for binarization, which not only simpliﬁes the post-processing but also enhances the performance of text detection. Based on a simple segmentation network, we validate the performance improvements of DB on ﬁve benchmark datasets, which consistently achieves stateof-the-art results, in terms of both detection accuracy and speed. In particular, with a light-weight backbone, the performance improvements by DB are signiﬁcant so that we can look for an ideal tradeoff between detection accuracy and efﬁciency. Speciﬁcally, with a backbone of ResNet-18, our detector achieves an F-measure of 82. 8, running at 62 FPS, on the MSRA-TD500 dataset. Code is available at: https: //github. com/MhLiao/DB.

PDF Details

AAAI Conference 2020 Conference Paper

TextScanner: Reading Characters in Order for Robust Scene Text Recognition

Zhaoyi Wan
Minghang He
Haoran Chen
Xiang Bai
Cong Yao

Driven by deep learning and a large volume of data, scene text recognition has evolved rapidly in recent years. Formerly, RNN-attention-based methods have dominated this ﬁeld, but suffer from the problem of attention drift in certain situations. Lately, semantic segmentation based algorithms have proven effective at recognizing text of different forms (horizontal, oriented and curved). However, these methods may produce spurious characters or miss genuine characters, as they rely heavily on a thresholding procedure operated on segmentation maps. To tackle these challenges, we propose in this paper an alternative approach, called TextScanner, for scene text recognition. TextScanner bears three characteristics: (1) Basically, it belongs to the semantic segmentation family, as it generates pixel-wise, multi-channel segmentation maps for character class, position and order; (2) Meanwhile, akin to RNN-attention-based methods, it also adopts RNN for context modeling; (3) Moreover, it performs paralleled prediction for character position and class, and ensures that characters are transcripted in the correct order. The experiments on standard benchmark datasets demonstrate that TextScanner outperforms the state-of-the-art methods. Moreover, TextScanner shows its superiority in recognizing more difﬁcult text such as Chinese transcripts and aligning with target characters.

PDF Details

AAAI Conference 2019 Conference Paper

Scene Text Recognition from Two-Dimensional Perspective

Minghui Liao
Jian Zhang
Zhaoyi Wan
Fengming Xie
Jiajun Liang
Pengyuan Lyu
Cong Yao
Xiang Bai

Inspired by speech recognition, recent state-of-the-art algorithms mostly consider scene text recognition as a sequence prediction problem. Though achieving excellent performance, these methods usually neglect an important fact that text in images are actually distributed in two-dimensional space. It is a nature quite different from that of speech, which is essentially a one-dimensional signal. In principle, directly compressing features of text into a one-dimensional form may lose useful information and introduce extra noise. In this paper, we approach scene text recognition from a two-dimensional perspective. A simple yet effective model, called Character Attention Fully Convolutional Network (CA-FCN), is devised for recognizing the text of arbitrary shapes. Scene text recognition is realized with a semantic segmentation network, where an attention mechanism for characters is adopted. Combined with a word formation module, CA-FCN can simultaneously recognize the script and predict the position of each character. Experiments demonstrate that the proposed algorithm outperforms previous methods on both regular and irregular text datasets. Moreover, it is proven to be more robust to imprecise localizations in the text detection phase, which are very common in practice.

PDF Details