Pashto language stemming algorithm

This paper presents a stemming algorithm for morphological analysis for less popular or minor language like Pashto language. There is lack of resources and tools that can be applied in different applications such as in document indexing, clustering, language processing, text analysis, database sea...

Full description

Saved in:

Bibliographic Details
Main Authors:	Aslamzai, Sebghatullah, Saidah Saad
Format:	Article
Language:	English
Published:	Penerbit Universiti Kebangsaan Malaysia 2015
Online Access:	http://journalarticle.ukm.my/8852/1/7048-23719-1-PB.pdf http://journalarticle.ukm.my/8852/ http://ejournal.ukm.my/apjitm/issue/view/609
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Kebangsaan Malaysia
Language:	English

id	my-ukm.journal.8852
record_format	eprints
spelling	my-ukm.journal.88522016-12-14T06:48:14Z http://journalarticle.ukm.my/8852/ Pashto language stemming algorithm Aslamzai, Sebghatullah Saidah Saad, This paper presents a stemming algorithm for morphological analysis for less popular or minor language like Pashto language. There is lack of resources and tools that can be applied in different applications such as in document indexing, clustering, language processing, text analysis, database search systems, information retrieval, and linguistic applications. The review of literature shows that only a few morphological studies have been conducted in the Pashto language, and research which focused on automatic stemming has not yet been fully analysed. In addition, no stemming algorithm has been proposed for extracting Pashto root words from the Pashto corpus, which is applicable for the above mentioned functions. Therefore, the objective of the current thesis is to develop a rule-based stemming algorithm for the Pashto language. The Pashto corpus is directly used as the input and the stemming algorithm uses both inflectional and derivational morphemes. The output is in the form of meaningful root word without affixes. Furthermore, the accuracy and strength of the proposed algorithm is evaluated using word count method. To validate the function of the developed algorithm, two native speakers of Pashto were recruited to evaluate the algorithm in terms of its accuracy and strength. The result of the study shows that the proposed algorithm has the accuracy of 87%. This study can have a great contribution to Pashto language in terms of extracting the root words useful for different purposes including data indexing, information retrieval, linguistic application, etc. This research also lays the ground for further studies on Pashto language analysis. Penerbit Universiti Kebangsaan Malaysia 2015-06 Article PeerReviewed application/pdf en http://journalarticle.ukm.my/8852/1/7048-23719-1-PB.pdf Aslamzai, Sebghatullah and Saidah Saad, (2015) Pashto language stemming algorithm. Asia-Pacific Journal of Information Technology and Multimedia, 4 (1). pp. 25-37. ISSN 2289-2192 http://ejournal.ukm.my/apjitm/issue/view/609
institution	Universiti Kebangsaan Malaysia
building	Perpustakaan Tun Sri Lanang Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Kebangsaan Malaysia
content_source	UKM Journal Article Repository
url_provider	http://journalarticle.ukm.my/
language	English
description	This paper presents a stemming algorithm for morphological analysis for less popular or minor language like Pashto language. There is lack of resources and tools that can be applied in different applications such as in document indexing, clustering, language processing, text analysis, database search systems, information retrieval, and linguistic applications. The review of literature shows that only a few morphological studies have been conducted in the Pashto language, and research which focused on automatic stemming has not yet been fully analysed. In addition, no stemming algorithm has been proposed for extracting Pashto root words from the Pashto corpus, which is applicable for the above mentioned functions. Therefore, the objective of the current thesis is to develop a rule-based stemming algorithm for the Pashto language. The Pashto corpus is directly used as the input and the stemming algorithm uses both inflectional and derivational morphemes. The output is in the form of meaningful root word without affixes. Furthermore, the accuracy and strength of the proposed algorithm is evaluated using word count method. To validate the function of the developed algorithm, two native speakers of Pashto were recruited to evaluate the algorithm in terms of its accuracy and strength. The result of the study shows that the proposed algorithm has the accuracy of 87%. This study can have a great contribution to Pashto language in terms of extracting the root words useful for different purposes including data indexing, information retrieval, linguistic application, etc. This research also lays the ground for further studies on Pashto language analysis.
format	Article
author	Aslamzai, Sebghatullah Saidah Saad,
spellingShingle	Aslamzai, Sebghatullah Saidah Saad, Pashto language stemming algorithm
author_facet	Aslamzai, Sebghatullah Saidah Saad,
author_sort	Aslamzai, Sebghatullah
title	Pashto language stemming algorithm
title_short	Pashto language stemming algorithm
title_full	Pashto language stemming algorithm
title_fullStr	Pashto language stemming algorithm
title_full_unstemmed	Pashto language stemming algorithm
title_sort	pashto language stemming algorithm
publisher	Penerbit Universiti Kebangsaan Malaysia
publishDate	2015
url	http://journalarticle.ukm.my/8852/1/7048-23719-1-PB.pdf http://journalarticle.ukm.my/8852/ http://ejournal.ukm.my/apjitm/issue/view/609
_version_	1643737592250761216

Pashto language stemming algorithm

Similar Items