A Syntactic-based Sentence Validation Technique for Malay Text Summarizer

In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence&...

Full description

Saved in:
Bibliographic Details
Main Authors: Alias, Suraya, Sainin, Mohd Shamrie, Mohammad, Siti Khaotijah
Format: Article
Language:English
Published: Universiti Utara Malaysia Press 2021
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/28778/1/JICT%2020%2003%202021%20329-352.pdf
https://doi.org/10.32890/jict2021.20.3.3
https://repo.uum.edu.my/id/eprint/28778/
https://e-journal.uum.edu.my/index.php/jict/article/view/14382
https://doi.org/10.32890/jict2021.20.3.3
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
id my.uum.repo.28778
record_format eprints
spelling my.uum.repo.287782023-05-17T15:08:58Z https://repo.uum.edu.my/id/eprint/28778/ A Syntactic-based Sentence Validation Technique for Malay Text Summarizer Alias, Suraya Sainin, Mohd Shamrie Mohammad, Siti Khaotijah QA75 Electronic computers. Computer science In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results. Universiti Utara Malaysia Press 2021 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/28778/1/JICT%2020%2003%202021%20329-352.pdf Alias, Suraya and Sainin, Mohd Shamrie and Mohammad, Siti Khaotijah (2021) A Syntactic-based Sentence Validation Technique for Malay Text Summarizer. Journal of Information and Communication Technology, 20 (03). pp. 329-352. ISSN 2180-3862 https://e-journal.uum.edu.my/index.php/jict/article/view/14382 https://doi.org/10.32890/jict2021.20.3.3 https://doi.org/10.32890/jict2021.20.3.3
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutional Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Alias, Suraya
Sainin, Mohd Shamrie
Mohammad, Siti Khaotijah
A Syntactic-based Sentence Validation Technique for Malay Text Summarizer
description In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.
format Article
author Alias, Suraya
Sainin, Mohd Shamrie
Mohammad, Siti Khaotijah
author_facet Alias, Suraya
Sainin, Mohd Shamrie
Mohammad, Siti Khaotijah
author_sort Alias, Suraya
title A Syntactic-based Sentence Validation Technique for Malay Text Summarizer
title_short A Syntactic-based Sentence Validation Technique for Malay Text Summarizer
title_full A Syntactic-based Sentence Validation Technique for Malay Text Summarizer
title_fullStr A Syntactic-based Sentence Validation Technique for Malay Text Summarizer
title_full_unstemmed A Syntactic-based Sentence Validation Technique for Malay Text Summarizer
title_sort syntactic-based sentence validation technique for malay text summarizer
publisher Universiti Utara Malaysia Press
publishDate 2021
url https://repo.uum.edu.my/id/eprint/28778/1/JICT%2020%2003%202021%20329-352.pdf
https://doi.org/10.32890/jict2021.20.3.3
https://repo.uum.edu.my/id/eprint/28778/
https://e-journal.uum.edu.my/index.php/jict/article/view/14382
https://doi.org/10.32890/jict2021.20.3.3
_version_ 1768010680786485248