INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION

Semantic role labeling (SRL) which is currently available in Indonesian still has the limitation of producing argument labeling for only one predicate for each sentence and can only identify a predicate consisting of one word. The quality of the existing SRL corpus is also still not good so that...

Full description

Saved in:
Bibliographic Details
Main Author: Gojali, Felicia
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/65820
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:65820
spelling id-itb.:658202022-06-25T03:41:31ZINDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION Gojali, Felicia Indonesia Final Project span-based semantic role labeling, SRL labeling guidelines, automatic summary system INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/65820 Semantic role labeling (SRL) which is currently available in Indonesian still has the limitation of producing argument labeling for only one predicate for each sentence and can only identify a predicate consisting of one word. The quality of the existing SRL corpus is also still not good so that it will cause confusion when the annotator wants to add and validate the SRL corpus. These things have an impact on the performance of the SRL-based automatic summary of Indonesian news articles. Therefore, this final project will develop a span-based SRL model and use biaffine scoring which opens the limits that have been described and applies it to an automatic summary system for Indonesian news articles. The SRL model built is span-based and can produce argument span labeling against multiple predicates in the same output structure. The model can also accept span as a predicate so that it can identify a predicate consisting of more than one word. The SRL model uses biaffine scoring in calculating the score of the argument and label predicate pairs. The construction of the SRL corpus begins with analyzing and making labeling guidelines for 200 predicates and the SRL corpus consists of 3681 sentences. The SRL model is then used in the news article automatic summary system. Experiments were carried out to determine the configuration of the SRL model that resulted in the best labeling and also to determine the appropriate summary alternative after opening the boundaries of the SRL model. The best SRL model resulted in F1 scores of 0.798 and 0.698 for test data 1 and test data 2 and the best summary system configuration resulted in F1 scores ROUGE-1, ROUGE-2 and ROUGE-L 0.3213, 0.1526 and 0.2967, respectively. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Semantic role labeling (SRL) which is currently available in Indonesian still has the limitation of producing argument labeling for only one predicate for each sentence and can only identify a predicate consisting of one word. The quality of the existing SRL corpus is also still not good so that it will cause confusion when the annotator wants to add and validate the SRL corpus. These things have an impact on the performance of the SRL-based automatic summary of Indonesian news articles. Therefore, this final project will develop a span-based SRL model and use biaffine scoring which opens the limits that have been described and applies it to an automatic summary system for Indonesian news articles. The SRL model built is span-based and can produce argument span labeling against multiple predicates in the same output structure. The model can also accept span as a predicate so that it can identify a predicate consisting of more than one word. The SRL model uses biaffine scoring in calculating the score of the argument and label predicate pairs. The construction of the SRL corpus begins with analyzing and making labeling guidelines for 200 predicates and the SRL corpus consists of 3681 sentences. The SRL model is then used in the news article automatic summary system. Experiments were carried out to determine the configuration of the SRL model that resulted in the best labeling and also to determine the appropriate summary alternative after opening the boundaries of the SRL model. The best SRL model resulted in F1 scores of 0.798 and 0.698 for test data 1 and test data 2 and the best summary system configuration resulted in F1 scores ROUGE-1, ROUGE-2 and ROUGE-L 0.3213, 0.1526 and 0.2967, respectively.
format Final Project
author Gojali, Felicia
spellingShingle Gojali, Felicia
INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION
author_facet Gojali, Felicia
author_sort Gojali, Felicia
title INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION
title_short INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION
title_full INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION
title_fullStr INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION
title_full_unstemmed INDONESIAN SEMANTIC ROLE LABELING FOR SINGLE DOCUMENT SUMMARIZATION
title_sort indonesian semantic role labeling for single document summarization
url https://digilib.itb.ac.id/gdl/view/65820
_version_ 1822932861004546048