RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN

The research of Indonesian text simplification is few and not handling many cases yet. The method of text simplification which handles many cases comes from the rule-based method of Siddharthan (2004), but not for Indonesian language. In this final project, the construction of a text simplificati...

Full description

Saved in:
Bibliographic Details
Main Author: Abdi Haryadi. H, M.
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76871
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The research of Indonesian text simplification is few and not handling many cases yet. The method of text simplification which handles many cases comes from the rule-based method of Siddharthan (2004), but not for Indonesian language. In this final project, the construction of a text simplification referred to this method which handles clause relative, apposition, complex sentence, and reference. This module was then employed as a preprocess of the extractive automatic summarization of Gojali (2022) which includes SRL mechanism. The involved processes started with text simplification dataset construction. Next, the text simplification module was constructed with POS-tagging, noun chunking, grammatical function extraction, agreement extraction, third-person pronoun resolution, clause relative subject identification, clause and apposition boundary marking, transformation, and reference substitution processes. Next, the evaluation was conducted on that dataset and that module, and the performance of SRL and automatic summarization of Gojali (2022) which has been integrated with text simplification. The achieved result was the text simplification module needs to be adjusted for being applied to the Indonesian language, such as for case of time and place adverbs which resemble appositions, and case of complex sentences with mutual sentence elements. In addition, this module did not handle third-person pronoun resolution properly yet. The text simplification as a SRL preprocess increased and decreased the identification performance for some roles. The text simplification also increased precision performance and decreased recall performance on automatic summarization which was developed by Gojali (2022), with ROUGE metrics (Lin, 2004).