A review of Arabic text recognition dataset
Building a robust Optical Character Recognition (OCR) system for languages, such as Arabic with cursive scripts, has always been challenging. These challenges increase if the text contains diacritics of different sizes for characters and words. Apart from the complexity of the used font, these cha...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Penerbit Universiti Kebangsaan Malaysia
2020
|
Online Access: | http://journalarticle.ukm.my/15419/1/06.pdf http://journalarticle.ukm.my/15419/ http://www.ukm.my/apjitm/articles-year.php |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Kebangsaan Malaysia |
Language: | English |
Summary: | Building a robust Optical Character Recognition (OCR) system for languages, such as Arabic with cursive scripts,
has always been challenging. These challenges increase if the text contains diacritics of different sizes for
characters and words. Apart from the complexity of the used font, these challenges must be addressed in
recognizing the text of the Holy Quran. To solve these challenges, the OCR system would have to undergo
different phases. Each problem would have to be addressed using different approaches, thus, researchers are
studying these challenges and proposing various solutions. This has motivate this study to review Arabic OCR
dataset because the dataset plays a major role in determining the nature of the OCR systems. State-of-the-art
approaches in segmentation and recognition are discovered with the implementation of Recurrent Neural
Networks (Long Short-Term Memory-LSTM and Gated Recurrent Unit-GRU) with the use of the Connectionist
Temporal Classification (CTC). This also includes deep learning model and implementation of GRU in the Arabic
domain. This paper has contribute in profiling the Arabic text recognition dataset thus determining the nature of
OCR system developed and has identified research direction in building Arabic text recognition dataset. |
---|