Accurate and robust detection and recognition of texts in scene

Scene text detection and recognition aim to localize the texts in natural scene images and output corresponding character sequences of texts. Automated scene text detection and recognition have attracted increasing interest in computer vision and deep learning communities due to its wide range of ap...

Full description

Saved in:

Bibliographic Details
Main Author:	Xue, Chuhui
Other Authors:	Lu Shijian
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/157998
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-157998
record_format	dspace
spelling	sg-ntu-dr.10356-1579982022-06-03T14:25:12Z Accurate and robust detection and recognition of texts in scene Xue, Chuhui Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering Scene text detection and recognition aim to localize the texts in natural scene images and output corresponding character sequences of texts. Automated scene text detection and recognition have attracted increasing interest in computer vision and deep learning communities due to its wide range of applications in neural machine translation, autonomous driving, etc. As compared with preliminary research that focuses on the design of hand-crafted features, modern deep-learning-based techniques have achieved significant improvements on scene text detection and recognition tasks. Such frameworks usually deploy convolutional neural networks (CNN), recurrent neural networks (RNN), or Transformers to extract image features for accurate text detection and recognition. However, automated detecting and recognizing texts in scenes remain challenging due to the complexity of scene text images. First, texts in scenes exhibit high variability and diversity in appearance due to the complex patterns of texts (e.g., colors, fonts, etc.) and various environments (e.g., lighting, occlusion, etc.). Second, scene texts usually have different lengths, orientations, and shapes that may suffer from both perspective and curvature distortions. Third, scene images usually have complex backgrounds that may contain similar patterns with texts (e.g., trees, traffic signs, etc.). Either of them will lead to incorrect prediction in scene text detection and recognition task. In this thesis, we propose several novel techniques for scene text detection and recognition that aim to produce more accurate detection and recognition of scene texts in different orientations, lengths, sizes, and shapes. First, we design a novel scene text detection approach that detects texts through border semantics awareness and bootstrapping. We introduce a bootstrapping technique that samples multiple `subsections' of a word or text line and accordingly relieves the constraint of limited training data effectively. In addition, a semantics-aware text border detection technique is designed which produces four types of text border segments for text detection. Second, we develop a novel multi-scale shape regression network (MSR) for accurate scene text detection. It detects scene texts by predicting dense text boundary points instead of sparse quadrilateral vertices which often suffers from regression errors while dealing with long text lines. Additionally, the multi-scale network extracts and fuses features at different scales concurrently and seamlessly which demonstrates superb tolerance to the text scale variation. Third, we design a mask-guided multi-task network that reliably detects and rectifies scene texts of arbitrary shapes. The proposed network detects text keypoints and landmark points for accurate text detection and rectification. Forth, we propose a novel scene text recognition method I2C2W that is tolerant to geometric and photometric degradation by decomposing scene text recognition into two inter-connected tasks and leveraging the advances of Transformer architecture. Extensive experiments show that the proposed techniques can accurately detect and recognize texts with various lengths, orientations, and shapes from natural scene images. Doctor of Philosophy 2022-05-16T10:16:15Z 2022-05-16T10:16:15Z 2022 Thesis-Doctor of Philosophy Xue, C. (2022). Accurate and robust detection and recognition of texts in scene. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/157998 https://hdl.handle.net/10356/157998 10.32657/10356/157998 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Xue, Chuhui Accurate and robust detection and recognition of texts in scene
description	Scene text detection and recognition aim to localize the texts in natural scene images and output corresponding character sequences of texts. Automated scene text detection and recognition have attracted increasing interest in computer vision and deep learning communities due to its wide range of applications in neural machine translation, autonomous driving, etc. As compared with preliminary research that focuses on the design of hand-crafted features, modern deep-learning-based techniques have achieved significant improvements on scene text detection and recognition tasks. Such frameworks usually deploy convolutional neural networks (CNN), recurrent neural networks (RNN), or Transformers to extract image features for accurate text detection and recognition. However, automated detecting and recognizing texts in scenes remain challenging due to the complexity of scene text images. First, texts in scenes exhibit high variability and diversity in appearance due to the complex patterns of texts (e.g., colors, fonts, etc.) and various environments (e.g., lighting, occlusion, etc.). Second, scene texts usually have different lengths, orientations, and shapes that may suffer from both perspective and curvature distortions. Third, scene images usually have complex backgrounds that may contain similar patterns with texts (e.g., trees, traffic signs, etc.). Either of them will lead to incorrect prediction in scene text detection and recognition task. In this thesis, we propose several novel techniques for scene text detection and recognition that aim to produce more accurate detection and recognition of scene texts in different orientations, lengths, sizes, and shapes. First, we design a novel scene text detection approach that detects texts through border semantics awareness and bootstrapping. We introduce a bootstrapping technique that samples multiple `subsections' of a word or text line and accordingly relieves the constraint of limited training data effectively. In addition, a semantics-aware text border detection technique is designed which produces four types of text border segments for text detection. Second, we develop a novel multi-scale shape regression network (MSR) for accurate scene text detection. It detects scene texts by predicting dense text boundary points instead of sparse quadrilateral vertices which often suffers from regression errors while dealing with long text lines. Additionally, the multi-scale network extracts and fuses features at different scales concurrently and seamlessly which demonstrates superb tolerance to the text scale variation. Third, we design a mask-guided multi-task network that reliably detects and rectifies scene texts of arbitrary shapes. The proposed network detects text keypoints and landmark points for accurate text detection and rectification. Forth, we propose a novel scene text recognition method I2C2W that is tolerant to geometric and photometric degradation by decomposing scene text recognition into two inter-connected tasks and leveraging the advances of Transformer architecture. Extensive experiments show that the proposed techniques can accurately detect and recognize texts with various lengths, orientations, and shapes from natural scene images.
author2	Lu Shijian
author_facet	Lu Shijian Xue, Chuhui
format	Thesis-Doctor of Philosophy
author	Xue, Chuhui
author_sort	Xue, Chuhui
title	Accurate and robust detection and recognition of texts in scene
title_short	Accurate and robust detection and recognition of texts in scene
title_full	Accurate and robust detection and recognition of texts in scene
title_fullStr	Accurate and robust detection and recognition of texts in scene
title_full_unstemmed	Accurate and robust detection and recognition of texts in scene
title_sort	accurate and robust detection and recognition of texts in scene
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/157998
_version_	1735491159158947840

Accurate and robust detection and recognition of texts in scene

Similar Items