RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN

The research of Indonesian text simplification is few and not handling many cases yet. The method of text simplification which handles many cases comes from the rule-based method of Siddharthan (2004), but not for Indonesian language. In this final project, the construction of a text simplificati...

Full description

Saved in:
Bibliographic Details
Main Author: Abdi Haryadi. H, M.
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76871
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76871
spelling id-itb.:768712023-08-19T22:12:52ZRULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN Abdi Haryadi. H, M. Indonesia Final Project text simplification, Indonesian text simplification, automatic summarization INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76871 The research of Indonesian text simplification is few and not handling many cases yet. The method of text simplification which handles many cases comes from the rule-based method of Siddharthan (2004), but not for Indonesian language. In this final project, the construction of a text simplification referred to this method which handles clause relative, apposition, complex sentence, and reference. This module was then employed as a preprocess of the extractive automatic summarization of Gojali (2022) which includes SRL mechanism. The involved processes started with text simplification dataset construction. Next, the text simplification module was constructed with POS-tagging, noun chunking, grammatical function extraction, agreement extraction, third-person pronoun resolution, clause relative subject identification, clause and apposition boundary marking, transformation, and reference substitution processes. Next, the evaluation was conducted on that dataset and that module, and the performance of SRL and automatic summarization of Gojali (2022) which has been integrated with text simplification. The achieved result was the text simplification module needs to be adjusted for being applied to the Indonesian language, such as for case of time and place adverbs which resemble appositions, and case of complex sentences with mutual sentence elements. In addition, this module did not handle third-person pronoun resolution properly yet. The text simplification as a SRL preprocess increased and decreased the identification performance for some roles. The text simplification also increased precision performance and decreased recall performance on automatic summarization which was developed by Gojali (2022), with ROUGE metrics (Lin, 2004). text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The research of Indonesian text simplification is few and not handling many cases yet. The method of text simplification which handles many cases comes from the rule-based method of Siddharthan (2004), but not for Indonesian language. In this final project, the construction of a text simplification referred to this method which handles clause relative, apposition, complex sentence, and reference. This module was then employed as a preprocess of the extractive automatic summarization of Gojali (2022) which includes SRL mechanism. The involved processes started with text simplification dataset construction. Next, the text simplification module was constructed with POS-tagging, noun chunking, grammatical function extraction, agreement extraction, third-person pronoun resolution, clause relative subject identification, clause and apposition boundary marking, transformation, and reference substitution processes. Next, the evaluation was conducted on that dataset and that module, and the performance of SRL and automatic summarization of Gojali (2022) which has been integrated with text simplification. The achieved result was the text simplification module needs to be adjusted for being applied to the Indonesian language, such as for case of time and place adverbs which resemble appositions, and case of complex sentences with mutual sentence elements. In addition, this module did not handle third-person pronoun resolution properly yet. The text simplification as a SRL preprocess increased and decreased the identification performance for some roles. The text simplification also increased precision performance and decreased recall performance on automatic summarization which was developed by Gojali (2022), with ROUGE metrics (Lin, 2004).
format Final Project
author Abdi Haryadi. H, M.
spellingShingle Abdi Haryadi. H, M.
RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN
author_facet Abdi Haryadi. H, M.
author_sort Abdi Haryadi. H, M.
title RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN
title_short RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN
title_full RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN
title_fullStr RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN
title_full_unstemmed RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN
title_sort rule-based text simplification for extractive automatic summarization of news articles in indonesian
url https://digilib.itb.ac.id/gdl/view/76871
_version_ 1822008106470604800