Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar

Malay compound word is defined as a form of words that exists when two or more words are combined into a single syntax, and it gives a specific meaning. Thus, this extraction of compound words is significant for the following research, which is text summarization, grammar checker, sentiments analysi...

Full description

Saved in:
Bibliographic Details
Main Author: Abu Bakar, Zamri
Format: Thesis
Language:English
Published: 2023
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/88705/1/88705.pdf
https://ir.uitm.edu.my/id/eprint/88705/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Mara
Language: English
Description
Summary:Malay compound word is defined as a form of words that exists when two or more words are combined into a single syntax, and it gives a specific meaning. Thus, this extraction of compound words is significant for the following research, which is text summarization, grammar checker, sentiments analysis and machine translation. The aim of this study is to propose a new extraction technique using linguistic approaches that combines many features and rules. There are many research efforts that have been proposed in extracting compound word using linguistic approaches. However, the result for this approach still produces some problems in giving a better result. Overall, this study has three objectives; to identify new rules in detecting the Malay compound word, to construct an improved compound word extraction technique (algorithm) that combines many rules for Malay sentences using linguistic approaches, and lastly to evaluate the accuracy of proposed technique from using the standard evaluation of Recall, Precious and F-Measure. To achieve the objective, this research explores a linguistic method for extracting compound word from standard Malay corpus. A Malay news dataset was used to extract compound word in this research. Therefore, an improvement for the effectiveness of the compound word extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach to enhance the extraction of compound word processing. Several preprocessing steps were involved which include normalization, tokenization, stemming and tagging. Finally, this study described several rules-based and modified the rules to get the most relevant relation between the first word and the second word in order to assist this study in solving the problems.