AMR-TO-TEXT GENERATION FOR INDONESIAN LANGUAGE USING PRETRAINED LANGUAGE MODEL AND ITS APPLICATION FOR TEXT SUMMARIZATION
AMR-to-text generation is the process of generating text from Abstract Meaning Representation (AMR) graph. In Indonesian AMR-based text summarization, the text generation process usually uses Simple NLG which has the disadvantage that the generated text is only in a form of bag of words. Therefor...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/66591 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | AMR-to-text generation is the process of generating text from Abstract Meaning
Representation (AMR) graph. In Indonesian AMR-based text summarization, the
text generation process usually uses Simple NLG which has the disadvantage that
the generated text is only in a form of bag of words. Therefore, in this final project,
an AMR-to-text generation model for Indonesian language is developed using
pretrained language model approach.
In this final project, the development of AMR-to-text generation model is carried
out by fine-tuning pretrained language model. We also observe the effect of adding
supervised task adaptation and tree-level embedding to the performance of AMRto-text generation model. Pretrained language models that we evaluate are IndoT5-
base, mT5-base, and IndoBART. Based on the results, the best method is finetuning IndoT5 model with linearized PENMAN representation as an input with
additional supervised task adaptation. The BLEU score from this method on the
AMR Simple Sentences Test Dataset is 0.5048 and 0.3180 on the AMR News
Sentences Test Dataset.
As a case study, the AMR-to-text generation model is used to generate Indonesian
abstractive summary from summary graph generated from the AMR-based
summarization system (Akhyar, 2021). The result of the system on the XLSumIndonesian test dataset gave a value of ROUGE-1 0.2123 and ROUGE-2 0.0496.
The system can produce abstractive summary with a higher ROUGE-1 score than
the previous AMR-based summarization system produced by Akhyar (2021) but
has a lower ROUGE-2 score.
|
---|