An end-to-end model for multi-view scene text recognition

Due to the increasing applications of surveillance and monitoring such as person re-identification, vehicle reidentification and sports events tracking, the necessity of text detection and end-to-end recognition is also growing. Although the past deep learning-based models have addressed several cha...

Full description

Saved in:

Bibliographic Details
Main Authors:	Banerjee, Ayan, Shivakumara, Palaiahnakote, Bhattacharya, Saumik, Pal, Umapada, Liu, Cheng-Lin
Format:	Article
Published:	Elsevier 2024
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://eprints.um.edu.my/45920/ https://doi.org/10.1016/j.patcog.2023.110206
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaya

id	my.um.eprints.45920
record_format	eprints
spelling	my.um.eprints.459202024-11-14T04:34:33Z http://eprints.um.edu.my/45920/ An end-to-end model for multi-view scene text recognition Banerjee, Ayan Shivakumara, Palaiahnakote Bhattacharya, Saumik Pal, Umapada Liu, Cheng-Lin QA75 Electronic computers. Computer science Due to the increasing applications of surveillance and monitoring such as person re-identification, vehicle reidentification and sports events tracking, the necessity of text detection and end-to-end recognition is also growing. Although the past deep learning-based models have addressed several challenges such as arbitraryshaped text, multiple scripts, and variations in the geometric structure of characters, the scope of the models is limited to a single view. This paper presents an end-to-end model for text recognition through refining the multi-views of the same scene, which is called E2EMVSTR (End-to-End Model for Multi-View Scene Text Recognition). Considering the common characteristics shared in multi-view texts, we propose a cycle consistency pairwise similarity-based deep learning model to find texts more efficiently in three input views. Further, the extracted texts are supplied to a Siamese network and semi-supervised attention embedding combinational network for obtaining recognition results. The proposed model combines natural language processing and genetic algorithm models to restore missing character information and correct wrong recognition results. In experiments on our multi-view dataset and several benchmark datasets, the proposed method is proven effective compared to the state-of-the-art methods. The dataset and codes will be made available to the public upon acceptance. Elsevier 2024-05 Article PeerReviewed Banerjee, Ayan and Shivakumara, Palaiahnakote and Bhattacharya, Saumik and Pal, Umapada and Liu, Cheng-Lin (2024) An end-to-end model for multi-view scene text recognition. Pattern Recognition, 149. p. 110206. ISSN 0031-3203, https://doi.org/10.1016/j.patcog.2023.110206
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Research Repository
url_provider	http://eprints.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Banerjee, Ayan Shivakumara, Palaiahnakote Bhattacharya, Saumik Pal, Umapada Liu, Cheng-Lin An end-to-end model for multi-view scene text recognition
description	Due to the increasing applications of surveillance and monitoring such as person re-identification, vehicle reidentification and sports events tracking, the necessity of text detection and end-to-end recognition is also growing. Although the past deep learning-based models have addressed several challenges such as arbitraryshaped text, multiple scripts, and variations in the geometric structure of characters, the scope of the models is limited to a single view. This paper presents an end-to-end model for text recognition through refining the multi-views of the same scene, which is called E2EMVSTR (End-to-End Model for Multi-View Scene Text Recognition). Considering the common characteristics shared in multi-view texts, we propose a cycle consistency pairwise similarity-based deep learning model to find texts more efficiently in three input views. Further, the extracted texts are supplied to a Siamese network and semi-supervised attention embedding combinational network for obtaining recognition results. The proposed model combines natural language processing and genetic algorithm models to restore missing character information and correct wrong recognition results. In experiments on our multi-view dataset and several benchmark datasets, the proposed method is proven effective compared to the state-of-the-art methods. The dataset and codes will be made available to the public upon acceptance.
format	Article
author	Banerjee, Ayan Shivakumara, Palaiahnakote Bhattacharya, Saumik Pal, Umapada Liu, Cheng-Lin
author_facet	Banerjee, Ayan Shivakumara, Palaiahnakote Bhattacharya, Saumik Pal, Umapada Liu, Cheng-Lin
author_sort	Banerjee, Ayan
title	An end-to-end model for multi-view scene text recognition
title_short	An end-to-end model for multi-view scene text recognition
title_full	An end-to-end model for multi-view scene text recognition
title_fullStr	An end-to-end model for multi-view scene text recognition
title_full_unstemmed	An end-to-end model for multi-view scene text recognition
title_sort	end-to-end model for multi-view scene text recognition
publisher	Elsevier
publishDate	2024
url	http://eprints.um.edu.my/45920/ https://doi.org/10.1016/j.patcog.2023.110206
_version_	1816130477603422208

An end-to-end model for multi-view scene text recognition

Similar Items