Arabic nested noun compound extraction based on linguistic features and statistical measures

The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statist...

Full description

Saved in:

Bibliographic Details
Main Authors:	Omar, N., Al-Tashi, Q.
Format:	Article
Published:	Universiti Kebangsaan Malaysia Press 2018
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85047951005&doi=10.17576%2fgema-2018-1802-07&partnerID=40&md5=2f83b585a48dcfab4e4849ea35017dbc http://eprints.utp.edu.my/21597/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Petronas

id	my.utp.eprints.21597
record_format	eprints
spelling	my.utp.eprints.215972018-08-01T03:07:33Z Arabic nested noun compound extraction based on linguistic features and statistical measures Omar, N. Al-Tashi, Q. The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound. Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due to the morphological, orthographic, syntactic and semantic variations. Many features have an important effect on the efficiency of extracting a noun compound such as unit-hood, contextual information, and term-hood. Hence, there is a need to improve the effectiveness of the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic approach and a statistical method with a view to enhance the extraction of the Arabic nested noun compound. A number of pre-processing phases are presented, including transformation, tokenization, and normalisation. The linguistic approaches that have been used in this study consist of a part-of-speech tagging and the named entities pattern, whereas the proposed statistical methods that have been used in this study consist of the NC-value, NTC-value, NLC-value, and the combination of these association measures. The proposed methods have demonstrated that the combined association measures have outperformed the NLC-value, NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90, 88, 87, and 81 for bigram, trigram, 4-gram, and 5-gram, respectively. Â© 2018, Universiti Kebangsaan Malaysia Press. All rights reserved. Universiti Kebangsaan Malaysia Press 2018 Article NonPeerReviewed https://www.scopus.com/inward/record.uri?eid=2-s2.0-85047951005&doi=10.17576%2fgema-2018-1802-07&partnerID=40&md5=2f83b585a48dcfab4e4849ea35017dbc Omar, N. and Al-Tashi, Q. (2018) Arabic nested noun compound extraction based on linguistic features and statistical measures. GEMA Online Journal of Language Studies, 18 (2). pp. 93-107. http://eprints.utp.edu.my/21597/
institution	Universiti Teknologi Petronas
building	UTP Resource Centre
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Petronas
content_source	UTP Institutional Repository
url_provider	http://eprints.utp.edu.my/
description	The extraction of Arabic nested noun compound is significant for several research areas such as sentiment analysis, text summarization, word categorization, grammar checker, and machine translation. Much research has studied the extraction of Arabic noun compound using linguistic approaches, statistical methods, or a hybrid of both. A wide range of the existing approaches concentrate on the extraction of the bi-gram or tri-gram noun compound. Nonetheless, extracting a 4-gram or 5-gram nested noun compound is a challenging task due to the morphological, orthographic, syntactic and semantic variations. Many features have an important effect on the efficiency of extracting a noun compound such as unit-hood, contextual information, and term-hood. Hence, there is a need to improve the effectiveness of the Arabic nested noun compound extraction. Thus, this paper proposes a hybrid linguistic approach and a statistical method with a view to enhance the extraction of the Arabic nested noun compound. A number of pre-processing phases are presented, including transformation, tokenization, and normalisation. The linguistic approaches that have been used in this study consist of a part-of-speech tagging and the named entities pattern, whereas the proposed statistical methods that have been used in this study consist of the NC-value, NTC-value, NLC-value, and the combination of these association measures. The proposed methods have demonstrated that the combined association measures have outperformed the NLC-value, NTC-value, and NC-value in terms of nested noun compound extraction by achieving 90, 88, 87, and 81 for bigram, trigram, 4-gram, and 5-gram, respectively. Â© 2018, Universiti Kebangsaan Malaysia Press. All rights reserved.
format	Article
author	Omar, N. Al-Tashi, Q.
spellingShingle	Omar, N. Al-Tashi, Q. Arabic nested noun compound extraction based on linguistic features and statistical measures
author_facet	Omar, N. Al-Tashi, Q.
author_sort	Omar, N.
title	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_short	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_full	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_fullStr	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_full_unstemmed	Arabic nested noun compound extraction based on linguistic features and statistical measures
title_sort	arabic nested noun compound extraction based on linguistic features and statistical measures
publisher	Universiti Kebangsaan Malaysia Press
publishDate	2018
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-85047951005&doi=10.17576%2fgema-2018-1802-07&partnerID=40&md5=2f83b585a48dcfab4e4849ea35017dbc http://eprints.utp.edu.my/21597/
_version_	1738656311768252416

Arabic nested noun compound extraction based on linguistic features and statistical measures

Similar Items