New Distance Measures for Arabic Handwritten Text Recognition

recent years, optical character recognition has attracted scientists and researchers. Latin, Chinese, Korean and Thai characters have been researched more thoroughly than Arabic characters. The research has concentrated firstly on printed and typeset characters until acceptable recognition accuracy...

Full description

Saved in:

Bibliographic Details
Main Author:	El-Bashir, Mohammad Said Mansur
Format:	Thesis
Language:	English English
Published:	2008
Online Access:	http://psasir.upm.edu.my/id/eprint/5233/1/FSKTM_2008_8a.pdf http://psasir.upm.edu.my/id/eprint/5233/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Putra Malaysia
Language:	English English

id	my.upm.eprints.5233
record_format	eprints
spelling	my.upm.eprints.52332013-05-27T07:21:21Z http://psasir.upm.edu.my/id/eprint/5233/ New Distance Measures for Arabic Handwritten Text Recognition El-Bashir, Mohammad Said Mansur recent years, optical character recognition has attracted scientists and researchers. Latin, Chinese, Korean and Thai characters have been researched more thoroughly than Arabic characters. The research has concentrated firstly on printed and typeset characters until acceptable recognition accuracy has been achieved. Nowadays, most of the researches have gone towards handwritten character recognition. Arabic text is cursive as characters in a sub-word are connected to each other. This makes the recognition process more complex and a segmentation procedure is required to separate the connected characters from each other before they can be recognized. Features extracted have to be chosen carefully since it has a very important role in the segmentation and recognition process. The recognition accuracy mostly depends on the classifier applied and the segmentation procedure. In this research work, a framework for recognizing the Arabic handwriting is presented. Two approaches have been proposed. The first approach has been designed to recognize the word as a whole to fit applications such as sorting postal mails and bank checks where the number of words or digits that need to be recognized is limited. The words may include country and city names written on postal mails, or some reserved words or amounts used on bank checks. The second approach represents the general case where any type of documents or handwritten text can be recognized by this approach. In both approaches, a preprocessing stage including image enhancement and normalization. The most significant features are extracted by implementing the Principal Components Analysis. A new segmentation-based approach is designed and implemented for the second approach to segment the text into characters, while no or simple segmentation procedure is performed in the first approach. The recognition step is performed by applying the nearest neighbor algorithm. Four different distance measures are used with the nearest neighbor, the first norm, second norm (Euclidean), and two new norms proposed called ENorm, EEuclidean. The two new norms proposed (ENorm, EEuclidean) are derived from the first and second norm respectively. The recognition accuracy is enhanced by using the two new norms proposed. The approaches have been tested as well, and a number of experiments have been discussed more thoroughly. The first approach is experimented by four datasets, which are sub-words containing two characters, sub-words containing three characters, Latin letters and Hindi digits which are used with Arabic language nowadays. The recognition accuracy is the attribute used for measurement, and an 8-fold cross validation technique is used to test this attribute. The average recognition accuracy is 94.8% for the digits, 78% for the three-character sub-words, 77% for the two-character sub-words and 67% for Latin letters. The second approach has achieved recognition accuracy of 73% without detecting dots and 77% with dot detection. 2008 Thesis NonPeerReviewed application/pdf en http://psasir.upm.edu.my/id/eprint/5233/1/FSKTM_2008_8a.pdf El-Bashir, Mohammad Said Mansur (2008) New Distance Measures for Arabic Handwritten Text Recognition. PhD thesis, Universiti Putra Malaysia. English
institution	Universiti Putra Malaysia
building	UPM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Putra Malaysia
content_source	UPM Institutional Repository
url_provider	http://psasir.upm.edu.my/
language	English English
description	recent years, optical character recognition has attracted scientists and researchers. Latin, Chinese, Korean and Thai characters have been researched more thoroughly than Arabic characters. The research has concentrated firstly on printed and typeset characters until acceptable recognition accuracy has been achieved. Nowadays, most of the researches have gone towards handwritten character recognition. Arabic text is cursive as characters in a sub-word are connected to each other. This makes the recognition process more complex and a segmentation procedure is required to separate the connected characters from each other before they can be recognized. Features extracted have to be chosen carefully since it has a very important role in the segmentation and recognition process. The recognition accuracy mostly depends on the classifier applied and the segmentation procedure. In this research work, a framework for recognizing the Arabic handwriting is presented. Two approaches have been proposed. The first approach has been designed to recognize the word as a whole to fit applications such as sorting postal mails and bank checks where the number of words or digits that need to be recognized is limited. The words may include country and city names written on postal mails, or some reserved words or amounts used on bank checks. The second approach represents the general case where any type of documents or handwritten text can be recognized by this approach. In both approaches, a preprocessing stage including image enhancement and normalization. The most significant features are extracted by implementing the Principal Components Analysis. A new segmentation-based approach is designed and implemented for the second approach to segment the text into characters, while no or simple segmentation procedure is performed in the first approach. The recognition step is performed by applying the nearest neighbor algorithm. Four different distance measures are used with the nearest neighbor, the first norm, second norm (Euclidean), and two new norms proposed called ENorm, EEuclidean. The two new norms proposed (ENorm, EEuclidean) are derived from the first and second norm respectively. The recognition accuracy is enhanced by using the two new norms proposed. The approaches have been tested as well, and a number of experiments have been discussed more thoroughly. The first approach is experimented by four datasets, which are sub-words containing two characters, sub-words containing three characters, Latin letters and Hindi digits which are used with Arabic language nowadays. The recognition accuracy is the attribute used for measurement, and an 8-fold cross validation technique is used to test this attribute. The average recognition accuracy is 94.8% for the digits, 78% for the three-character sub-words, 77% for the two-character sub-words and 67% for Latin letters. The second approach has achieved recognition accuracy of 73% without detecting dots and 77% with dot detection.
format	Thesis
author	El-Bashir, Mohammad Said Mansur
spellingShingle	El-Bashir, Mohammad Said Mansur New Distance Measures for Arabic Handwritten Text Recognition
author_facet	El-Bashir, Mohammad Said Mansur
author_sort	El-Bashir, Mohammad Said Mansur
title	New Distance Measures for Arabic Handwritten Text Recognition
title_short	New Distance Measures for Arabic Handwritten Text Recognition
title_full	New Distance Measures for Arabic Handwritten Text Recognition
title_fullStr	New Distance Measures for Arabic Handwritten Text Recognition
title_full_unstemmed	New Distance Measures for Arabic Handwritten Text Recognition
title_sort	new distance measures for arabic handwritten text recognition
publishDate	2008
url	http://psasir.upm.edu.my/id/eprint/5233/1/FSKTM_2008_8a.pdf http://psasir.upm.edu.my/id/eprint/5233/
_version_	1643823128656216064

New Distance Measures for Arabic Handwritten Text Recognition

Similar Items