A review of Arabic text recognition dataset

Building a robust Optical Character Recognition (OCR) system for languages, such as Arabic with cursive scripts, has always been challenging. These challenges increase if the text contains diacritics of different sizes for characters and words. Apart from the complexity of the used font, these cha...

Full description

Saved in:
Bibliographic Details
Main Authors: Idris Saleh Al-Sheikh, Masnizah Mohd, Lia Warlina
Format: Article
Language:English
Published: Penerbit Universiti Kebangsaan Malaysia 2020
Online Access:http://journalarticle.ukm.my/15419/1/06.pdf
http://journalarticle.ukm.my/15419/
http://www.ukm.my/apjitm/articles-year.php
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Kebangsaan Malaysia
Language: English
id my-ukm.journal.15419
record_format eprints
spelling my-ukm.journal.154192020-10-23T02:59:26Z http://journalarticle.ukm.my/15419/ A review of Arabic text recognition dataset Idris Saleh Al-Sheikh, Masnizah Mohd, Lia Warlina, Building a robust Optical Character Recognition (OCR) system for languages, such as Arabic with cursive scripts, has always been challenging. These challenges increase if the text contains diacritics of different sizes for characters and words. Apart from the complexity of the used font, these challenges must be addressed in recognizing the text of the Holy Quran. To solve these challenges, the OCR system would have to undergo different phases. Each problem would have to be addressed using different approaches, thus, researchers are studying these challenges and proposing various solutions. This has motivate this study to review Arabic OCR dataset because the dataset plays a major role in determining the nature of the OCR systems. State-of-the-art approaches in segmentation and recognition are discovered with the implementation of Recurrent Neural Networks (Long Short-Term Memory-LSTM and Gated Recurrent Unit-GRU) with the use of the Connectionist Temporal Classification (CTC). This also includes deep learning model and implementation of GRU in the Arabic domain. This paper has contribute in profiling the Arabic text recognition dataset thus determining the nature of OCR system developed and has identified research direction in building Arabic text recognition dataset. Penerbit Universiti Kebangsaan Malaysia 2020-06 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/15419/1/06.pdf Idris Saleh Al-Sheikh, and Masnizah Mohd, and Lia Warlina, (2020) A review of Arabic text recognition dataset. Asia-Pacific Journal of Information Technology and Multimedia, 9 (1). pp. 69-81. ISSN 2289-2192 http://www.ukm.my/apjitm/articles-year.php
institution Universiti Kebangsaan Malaysia
building Tun Sri Lanang Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Kebangsaan Malaysia
content_source UKM Journal Article Repository
url_provider http://journalarticle.ukm.my/
language English
description Building a robust Optical Character Recognition (OCR) system for languages, such as Arabic with cursive scripts, has always been challenging. These challenges increase if the text contains diacritics of different sizes for characters and words. Apart from the complexity of the used font, these challenges must be addressed in recognizing the text of the Holy Quran. To solve these challenges, the OCR system would have to undergo different phases. Each problem would have to be addressed using different approaches, thus, researchers are studying these challenges and proposing various solutions. This has motivate this study to review Arabic OCR dataset because the dataset plays a major role in determining the nature of the OCR systems. State-of-the-art approaches in segmentation and recognition are discovered with the implementation of Recurrent Neural Networks (Long Short-Term Memory-LSTM and Gated Recurrent Unit-GRU) with the use of the Connectionist Temporal Classification (CTC). This also includes deep learning model and implementation of GRU in the Arabic domain. This paper has contribute in profiling the Arabic text recognition dataset thus determining the nature of OCR system developed and has identified research direction in building Arabic text recognition dataset.
format Article
author Idris Saleh Al-Sheikh,
Masnizah Mohd,
Lia Warlina,
spellingShingle Idris Saleh Al-Sheikh,
Masnizah Mohd,
Lia Warlina,
A review of Arabic text recognition dataset
author_facet Idris Saleh Al-Sheikh,
Masnizah Mohd,
Lia Warlina,
author_sort Idris Saleh Al-Sheikh,
title A review of Arabic text recognition dataset
title_short A review of Arabic text recognition dataset
title_full A review of Arabic text recognition dataset
title_fullStr A review of Arabic text recognition dataset
title_full_unstemmed A review of Arabic text recognition dataset
title_sort review of arabic text recognition dataset
publisher Penerbit Universiti Kebangsaan Malaysia
publishDate 2020
url http://journalarticle.ukm.my/15419/1/06.pdf
http://journalarticle.ukm.my/15419/
http://www.ukm.my/apjitm/articles-year.php
_version_ 1681490391989223424