INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS

Summary is a presentation of writing in a shorter form and still contains important information from the writing. Reading articles along with summaries can help humans understand the contents of the article efficiently and effectively. However, compiling an abstract summary of an article requires...

Full description

Saved in:
Bibliographic Details
Main Author: Anthony
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76390
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Summary is a presentation of writing in a shorter form and still contains important information from the writing. Reading articles along with summaries can help humans understand the contents of the article efficiently and effectively. However, compiling an abstract summary of an article requires time and human resources with skills above a certain threshold. This final project aims to create an abstractive summary program for Indonesian news with PEGASUS. PEGASUS is a deep learning model with transformer encoder-decoder architecture which is used to summarize text abstractly. Transformer encoder-decoder has an advantage in processing text data because it uses the attention mechanism, a data processing mechanism inspired by the process of human attention. A total of 5,670,868 Indonesian news articles without summaries and 357,499 Indonesian news articles with summaries from various sources were used to train the PEGASUS model. The PEGASUS model is trained with a special pre-training objective Gap Sentence Generation. The Gap Sentence Generation process helps the model to identify important sentences in the article. Model performance was quantitatively evaluated using the ROUGE-1, ROUGE-2 and ROUGE-L metrics. The PEGASUS model in this final project managed to achieve ROUGE-1 F1/ROUGE-2 F1/ROUGE-L F1 values of 52.43/41.23/48.18 on the Indo- Sum test dataset, 38.27/20.22/31.26 on the Liputan6 test dataset, 26.97/9.99/21.70 on XLSum test dataset. Meanwhile, the performance of the model is qualitatively evaluated by reviewing the sample summary articles in the dataset and the summary results of the latest news articles and non-news articles. The evaluation results show that the PEGASUS model succeeded in producing coherent and concise article summaries. Overall, the ability of the PEGASUS model to generate summaries is on par with human capabilities in general.