RULE-BASED TEXT SIMPLIFICATION FOR EXTRACTIVE AUTOMATIC SUMMARIZATION OF NEWS ARTICLES IN INDONESIAN
The research of Indonesian text simplification is few and not handling many cases yet. The method of text simplification which handles many cases comes from the rule-based method of Siddharthan (2004), but not for Indonesian language. In this final project, the construction of a text simplificati...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76871 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The research of Indonesian text simplification is few and not handling many cases
yet. The method of text simplification which handles many cases comes from the
rule-based method of Siddharthan (2004), but not for Indonesian language. In this
final project, the construction of a text simplification referred to this method
which handles clause relative, apposition, complex sentence, and reference. This
module was then employed as a preprocess of the extractive automatic
summarization of Gojali (2022) which includes SRL mechanism.
The involved processes started with text simplification dataset construction. Next,
the text simplification module was constructed with POS-tagging, noun chunking,
grammatical function extraction, agreement extraction, third-person pronoun
resolution, clause relative subject identification, clause and apposition boundary
marking, transformation, and reference substitution processes. Next, the
evaluation was conducted on that dataset and that module, and the performance of
SRL and automatic summarization of Gojali (2022) which has been integrated
with text simplification.
The achieved result was the text simplification module needs to be adjusted for
being applied to the Indonesian language, such as for case of time and place
adverbs which resemble appositions, and case of complex sentences with mutual
sentence elements. In addition, this module did not handle third-person pronoun
resolution properly yet. The text simplification as a SRL preprocess increased and
decreased the identification performance for some roles. The text simplification
also increased precision performance and decreased recall performance on
automatic summarization which was developed by Gojali (2022), with ROUGE
metrics (Lin, 2004). |
---|