CROSS-LINGUAL TRANSFER FOR SEMANTIC ROLE LABELING IN INDONESIAN
Semantic role labeling is an approach in semantic analysis that understands semantic relationships in sentences, such as who does what to whom, where, when, etc. The currently available semantic role labeling (SRL) model in Indonesian still has difficulty getting good results due to the lack of anno...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Subjects: | |
Online Access: | https://digilib.itb.ac.id/gdl/view/82458 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Semantic role labeling is an approach in semantic analysis that understands semantic relationships in sentences, such as who does what to whom, where, when, etc. The currently available semantic role labeling (SRL) model in Indonesian still has difficulty getting good results due to the lack of annotation corpus required for training, compared to the English SRL model. Therefore, in this Thesis, an SRL model was developed by applying cross-lingual transfer.
The cross-lingual transfer method can be applied to overcome the poor performance of the SRL model due to the small Indonesian annotation corpus by utilizing the English annotation corpus, which has huge numbers. This method requires a multilingual model and a dataset with two different languages but the same domain. The multilingual models used in this Thesis are XLM-R and mT5 with base and large sizes. The datasets used are Universal PropBank and Gojali's data for the Indonesian dataset and CoNLL-2012 for the English dataset.
Testing was carried out to prove the performance of the SRL model produced using test data from Universal PropBank Indonesia and Gojali's data. Of all the models produced, the XLM-R large model which applies cross-lingual transfer has the best performance. The model produces an F1 score of 0.916 for Gojali data alone and 0.858 for the combination of Universal PropBank Indonesia data and Gojali data. |
---|