A Syntactic-based Sentence Validation Technique for Malay Text Summarizer
In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence&...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universiti Utara Malaysia Press
2021
|
Subjects: | |
Online Access: | https://repo.uum.edu.my/id/eprint/28778/1/JICT%2020%2003%202021%20329-352.pdf https://doi.org/10.32890/jict2021.20.3.3 https://repo.uum.edu.my/id/eprint/28778/ https://e-journal.uum.edu.my/index.php/jict/article/view/14382 https://doi.org/10.32890/jict2021.20.3.3 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Utara Malaysia |
Language: | English |
id |
my.uum.repo.28778 |
---|---|
record_format |
eprints |
spelling |
my.uum.repo.287782023-05-17T15:08:58Z https://repo.uum.edu.my/id/eprint/28778/ A Syntactic-based Sentence Validation Technique for Malay Text Summarizer Alias, Suraya Sainin, Mohd Shamrie Mohammad, Siti Khaotijah QA75 Electronic computers. Computer science In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results. Universiti Utara Malaysia Press 2021 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/28778/1/JICT%2020%2003%202021%20329-352.pdf Alias, Suraya and Sainin, Mohd Shamrie and Mohammad, Siti Khaotijah (2021) A Syntactic-based Sentence Validation Technique for Malay Text Summarizer. Journal of Information and Communication Technology, 20 (03). pp. 329-352. ISSN 2180-3862 https://e-journal.uum.edu.my/index.php/jict/article/view/14382 https://doi.org/10.32890/jict2021.20.3.3 https://doi.org/10.32890/jict2021.20.3.3 |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Institutional Repository |
url_provider |
http://repo.uum.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Alias, Suraya Sainin, Mohd Shamrie Mohammad, Siti Khaotijah A Syntactic-based Sentence Validation Technique for Malay Text Summarizer |
description |
In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results. |
format |
Article |
author |
Alias, Suraya Sainin, Mohd Shamrie Mohammad, Siti Khaotijah |
author_facet |
Alias, Suraya Sainin, Mohd Shamrie Mohammad, Siti Khaotijah |
author_sort |
Alias, Suraya |
title |
A Syntactic-based Sentence Validation Technique for Malay Text Summarizer |
title_short |
A Syntactic-based Sentence Validation Technique for Malay Text Summarizer |
title_full |
A Syntactic-based Sentence Validation Technique for Malay Text Summarizer |
title_fullStr |
A Syntactic-based Sentence Validation Technique for Malay Text Summarizer |
title_full_unstemmed |
A Syntactic-based Sentence Validation Technique for Malay Text Summarizer |
title_sort |
syntactic-based sentence validation technique for malay text summarizer |
publisher |
Universiti Utara Malaysia Press |
publishDate |
2021 |
url |
https://repo.uum.edu.my/id/eprint/28778/1/JICT%2020%2003%202021%20329-352.pdf https://doi.org/10.32890/jict2021.20.3.3 https://repo.uum.edu.my/id/eprint/28778/ https://e-journal.uum.edu.my/index.php/jict/article/view/14382 https://doi.org/10.32890/jict2021.20.3.3 |
_version_ |
1768010680786485248 |