INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS

Summary is a presentation of writing in a shorter form and still contains important information from the writing. Reading articles along with summaries can help humans understand the contents of the article efficiently and effectively. However, compiling an abstract summary of an article requires...

Full description

Saved in:
Bibliographic Details
Main Author: Anthony
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76390
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76390
spelling id-itb.:763902023-08-15T08:52:50ZINDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS Anthony Indonesia Final Project Transformer, Abstractive Summarization, PEGASUS, Deep Learning, Attention Mechanism INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76390 Summary is a presentation of writing in a shorter form and still contains important information from the writing. Reading articles along with summaries can help humans understand the contents of the article efficiently and effectively. However, compiling an abstract summary of an article requires time and human resources with skills above a certain threshold. This final project aims to create an abstractive summary program for Indonesian news with PEGASUS. PEGASUS is a deep learning model with transformer encoder-decoder architecture which is used to summarize text abstractly. Transformer encoder-decoder has an advantage in processing text data because it uses the attention mechanism, a data processing mechanism inspired by the process of human attention. A total of 5,670,868 Indonesian news articles without summaries and 357,499 Indonesian news articles with summaries from various sources were used to train the PEGASUS model. The PEGASUS model is trained with a special pre-training objective Gap Sentence Generation. The Gap Sentence Generation process helps the model to identify important sentences in the article. Model performance was quantitatively evaluated using the ROUGE-1, ROUGE-2 and ROUGE-L metrics. The PEGASUS model in this final project managed to achieve ROUGE-1 F1/ROUGE-2 F1/ROUGE-L F1 values of 52.43/41.23/48.18 on the Indo- Sum test dataset, 38.27/20.22/31.26 on the Liputan6 test dataset, 26.97/9.99/21.70 on XLSum test dataset. Meanwhile, the performance of the model is qualitatively evaluated by reviewing the sample summary articles in the dataset and the summary results of the latest news articles and non-news articles. The evaluation results show that the PEGASUS model succeeded in producing coherent and concise article summaries. Overall, the ability of the PEGASUS model to generate summaries is on par with human capabilities in general. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Summary is a presentation of writing in a shorter form and still contains important information from the writing. Reading articles along with summaries can help humans understand the contents of the article efficiently and effectively. However, compiling an abstract summary of an article requires time and human resources with skills above a certain threshold. This final project aims to create an abstractive summary program for Indonesian news with PEGASUS. PEGASUS is a deep learning model with transformer encoder-decoder architecture which is used to summarize text abstractly. Transformer encoder-decoder has an advantage in processing text data because it uses the attention mechanism, a data processing mechanism inspired by the process of human attention. A total of 5,670,868 Indonesian news articles without summaries and 357,499 Indonesian news articles with summaries from various sources were used to train the PEGASUS model. The PEGASUS model is trained with a special pre-training objective Gap Sentence Generation. The Gap Sentence Generation process helps the model to identify important sentences in the article. Model performance was quantitatively evaluated using the ROUGE-1, ROUGE-2 and ROUGE-L metrics. The PEGASUS model in this final project managed to achieve ROUGE-1 F1/ROUGE-2 F1/ROUGE-L F1 values of 52.43/41.23/48.18 on the Indo- Sum test dataset, 38.27/20.22/31.26 on the Liputan6 test dataset, 26.97/9.99/21.70 on XLSum test dataset. Meanwhile, the performance of the model is qualitatively evaluated by reviewing the sample summary articles in the dataset and the summary results of the latest news articles and non-news articles. The evaluation results show that the PEGASUS model succeeded in producing coherent and concise article summaries. Overall, the ability of the PEGASUS model to generate summaries is on par with human capabilities in general.
format Final Project
author Anthony
spellingShingle Anthony
INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS
author_facet Anthony
author_sort Anthony
title INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS
title_short INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS
title_full INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS
title_fullStr INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS
title_full_unstemmed INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS
title_sort indonesian news abstractive summarization using pegasus
url https://digilib.itb.ac.id/gdl/view/76390
_version_ 1822007969952301056