AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION

AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefor...

Full description

Saved in:
Bibliographic Details
Main Author: Husada Daryanto, Taufiq
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/66591
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefore, in this final project, an AMR-to-text generation model for Indonesian language is developed using pretrained language model approach. In this final project, the development of AMR-to-text generation model is carried out by fine-tuning pretrained language model. We also observe the effect of adding supervised task adaptation and tree-level embedding to the performance of AMRto-text generation model. Pretrained language models that we evaluate are IndoT5- base, mT5-base, and IndoBART. Based on the results, the best method is finetuning IndoT5 model with linearized PENMAN representation as an input with additional supervised task adaptation. The BLEU score from this method on the AMR Simple Sentences Test Dataset is 0.5048 and 0.3180 on the AMR News Sentences Test Dataset. As a case study, the AMR-to-text generation model is used to generate Indonesian abstractive summary from summary graph generated from the AMR-based summarization system (Akhyar, 2021). The result of the system on the XLSumIndonesian test dataset gave a value of ROUGE-1 0.2123 and ROUGE-2 0.0496. The system can produce abstractive summary with a higher ROUGE-1 score than the previous AMR-based summarization system produced by Akhyar (2021) but has a lower ROUGE-2 score.