AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefor...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/66591 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:66591 |
---|---|
spelling |
id-itb.:665912022-06-29T08:21:16ZAMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION Husada Daryanto, Taufiq Indonesia Final Project text generation, Abstract Meaning Representation, summarization, fine-tuning, graph, supervised task adaptation, tree-level embedding INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/66591 AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefore, in this final project, an AMR-to-text generation model for Indonesian language is developed using pretrained language model approach. In this final project, the development of AMR-to-text generation model is carried out by fine-tuning pretrained language model. We also observe the effect of adding supervised task adaptation and tree-level embedding to the performance of AMRto-text generation model. Pretrained language models that we evaluate are IndoT5- base, mT5-base, and IndoBART. Based on the results, the best method is finetuning IndoT5 model with linearized PENMAN representation as an input with additional supervised task adaptation. The BLEU score from this method on the AMR Simple Sentences Test Dataset is 0.5048 and 0.3180 on the AMR News Sentences Test Dataset. As a case study, the AMR-to-text generation model is used to generate Indonesian abstractive summary from summary graph generated from the AMR-based summarization system (Akhyar, 2021). The result of the system on the XLSumIndonesian test dataset gave a value of ROUGE-1 0.2123 and ROUGE-2 0.0496. The system can produce abstractive summary with a higher ROUGE-1 score than the previous AMR-based summarization system produced by Akhyar (2021) but has a lower ROUGE-2 score. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
AMR-to-text generation is the process of generating text from Abstract Meaning
Representation (AMR) graph. In Indonesian AMR-based text summarization, the
text generation process usually uses Simple NLG which has the disadvantage that
the generated text is only in a form of bag of words. Therefore, in this final project,
an AMR-to-text generation model for Indonesian language is developed using
pretrained language model approach.
In this final project, the development of AMR-to-text generation model is carried
out by fine-tuning pretrained language model. We also observe the effect of adding
supervised task adaptation and tree-level embedding to the performance of AMRto-text generation model. Pretrained language models that we evaluate are IndoT5-
base, mT5-base, and IndoBART. Based on the results, the best method is finetuning IndoT5 model with linearized PENMAN representation as an input with
additional supervised task adaptation. The BLEU score from this method on the
AMR Simple Sentences Test Dataset is 0.5048 and 0.3180 on the AMR News
Sentences Test Dataset.
As a case study, the AMR-to-text generation model is used to generate Indonesian
abstractive summary from summary graph generated from the AMR-based
summarization system (Akhyar, 2021). The result of the system on the XLSumIndonesian test dataset gave a value of ROUGE-1 0.2123 and ROUGE-2 0.0496.
The system can produce abstractive summary with a higher ROUGE-1 score than
the previous AMR-based summarization system produced by Akhyar (2021) but
has a lower ROUGE-2 score.
|
format |
Final Project |
author |
Husada Daryanto, Taufiq |
spellingShingle |
Husada Daryanto, Taufiq AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION |
author_facet |
Husada Daryanto, Taufiq |
author_sort |
Husada Daryanto, Taufiq |
title |
AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION |
title_short |
AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION |
title_full |
AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION |
title_fullStr |
AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION |
title_full_unstemmed |
AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION |
title_sort |
amr-to-text generation for indonesian language using pretrained language model and its application for text summarization |
url |
https://digilib.itb.ac.id/gdl/view/66591 |
_version_ |
1822277666683748352 |