AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION

AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefor...

Full description

Saved in:

Bibliographic Details
Main Author:	Husada Daryanto, Taufiq
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/66591
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:66591
spelling	id-itb.:665912022-06-29T08:21:16ZAMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION Husada Daryanto, Taufiq Indonesia Final Project text generation, Abstract Meaning Representation, summarization, fine-tuning, graph, supervised task adaptation, tree-level embedding INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/66591 AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefore, in this final project, an AMR-to-text generation model for Indonesian language is developed using pretrained language model approach. In this final project, the development of AMR-to-text generation model is carried out by fine-tuning pretrained language model. We also observe the effect of adding supervised task adaptation and tree-level embedding to the performance of AMRto-text generation model. Pretrained language models that we evaluate are IndoT5- base, mT5-base, and IndoBART. Based on the results, the best method is finetuning IndoT5 model with linearized PENMAN representation as an input with additional supervised task adaptation. The BLEU score from this method on the AMR Simple Sentences Test Dataset is 0.5048 and 0.3180 on the AMR News Sentences Test Dataset. As a case study, the AMR-to-text generation model is used to generate Indonesian abstractive summary from summary graph generated from the AMR-based summarization system (Akhyar, 2021). The result of the system on the XLSumIndonesian test dataset gave a value of ROUGE-1 0.2123 and ROUGE-2 0.0496. The system can produce abstractive summary with a higher ROUGE-1 score than the previous AMR-based summarization system produced by Akhyar (2021) but has a lower ROUGE-2 score. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefore, in this final project, an AMR-to-text generation model for Indonesian language is developed using pretrained language model approach. In this final project, the development of AMR-to-text generation model is carried out by fine-tuning pretrained language model. We also observe the effect of adding supervised task adaptation and tree-level embedding to the performance of AMRto-text generation model. Pretrained language models that we evaluate are IndoT5- base, mT5-base, and IndoBART. Based on the results, the best method is finetuning IndoT5 model with linearized PENMAN representation as an input with additional supervised task adaptation. The BLEU score from this method on the AMR Simple Sentences Test Dataset is 0.5048 and 0.3180 on the AMR News Sentences Test Dataset. As a case study, the AMR-to-text generation model is used to generate Indonesian abstractive summary from summary graph generated from the AMR-based summarization system (Akhyar, 2021). The result of the system on the XLSumIndonesian test dataset gave a value of ROUGE-1 0.2123 and ROUGE-2 0.0496. The system can produce abstractive summary with a higher ROUGE-1 score than the previous AMR-based summarization system produced by Akhyar (2021) but has a lower ROUGE-2 score.
format	Final Project
author	Husada Daryanto, Taufiq
spellingShingle	Husada Daryanto, Taufiq AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
author_facet	Husada Daryanto, Taufiq
author_sort	Husada Daryanto, Taufiq
title	AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
title_short	AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
title_full	AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
title_fullStr	AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
title_full_unstemmed	AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
title_sort	amr-to-text generation for indonesian language using pretrained language model and its application for text summarization
url	https://digilib.itb.ac.id/gdl/view/66591
_version_	1823648938661511168

AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION

Similar Items