INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION

Semantic role labeling (SRL) which is currently available in Indonesian still has the limitation of producing argument labeling for only one predicate for each sentence and can only identify a predicate consisting of one word. The quality of the existing SRL corpus is also still not good so that...

Full description

Saved in:
Bibliographic Details
Main Author: Gojali, Felicia
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/65820
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Semantic role labeling (SRL) which is currently available in Indonesian still has the limitation of producing argument labeling for only one predicate for each sentence and can only identify a predicate consisting of one word. The quality of the existing SRL corpus is also still not good so that it will cause confusion when the annotator wants to add and validate the SRL corpus. These things have an impact on the performance of the SRL-based automatic summary of Indonesian news articles. Therefore, this final project will develop a span-based SRL model and use biaffine scoring which opens the limits that have been described and applies it to an automatic summary system for Indonesian news articles. The SRL model built is span-based and can produce argument span labeling against multiple predicates in the same output structure. The model can also accept span as a predicate so that it can identify a predicate consisting of more than one word. The SRL model uses biaffine scoring in calculating the score of the argument and label predicate pairs. The construction of the SRL corpus begins with analyzing and making labeling guidelines for 200 predicates and the SRL corpus consists of 3681 sentences. The SRL model is then used in the news article automatic summary system. Experiments were carried out to determine the configuration of the SRL model that resulted in the best labeling and also to determine the appropriate summary alternative after opening the boundaries of the SRL model. The best SRL model resulted in F1 scores of 0.798 and 0.698 for test data 1 and test data 2 and the best summary system configuration resulted in F1 scores ROUGE-1, ROUGE-2 and ROUGE-L 0.3213, 0.1526 and 0.2967, respectively.