AAAI 2018
SqueezedText: A Real-Time Scene Text Recognition by Binary Convolutional Encoder-Decoder Network
Abstract
A new approach for real-time scene text recognition is proposed in this paper. A novel binary convolutional encoderdecoder network (B-CEDNet) together with a bidirectional recurrent neural network (Bi-RNN). The B-CEDNet is engaged as a visual front-end to provide elaborated character detection, and a back-end Bi-RNN performs characterlevel sequential correction and classification based on learned contextual knowledge. The front-end B-CEDNet can process multiple regions containing characters using a one-off forward operation, and is trained under binary constraints with significant compression. Hence it leads to both remarkable inference run-time speedup as well as memory usage reduction. With the elaborated character detection, the back-end Bi-RNN merely processes a low dimension feature sequence with category and spatial information of extracted characters for sequence correction and classification. By training with over 1, 000, 000 synthetic scene text images, the B-CEDNet achieves a recall rate of 0. 86, precision of 0. 88 and F-score of 0. 87 on ICDAR-03 and ICDAR-13. With the correction and classification by Bi-RNN, the proposed real-time scene text recognition achieves state-of-the-art accuracy while only consumes less than 1-ms inference run-time. The flow processing flow is realized on GPU with a small network size of 1. 01 MB for B-CEDNet and 3. 23 MB for Bi-RNN, which is much faster and smaller than the existing solutions.
Authors
Keywords
No keywords are indexed for this paper.
Context
- Venue
- AAAI Conference on Artificial Intelligence
- Archive span
- 1980-2026
- Indexed papers
- 28718
- Paper id
- 675768186371444208