INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS
Summary is a presentation of writing in a shorter form and still contains important information from the writing. Reading articles along with summaries can help humans understand the contents of the article efficiently and effectively. However, compiling an abstract summary of an article requires...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76390 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:76390 |
---|---|
spelling |
id-itb.:763902023-08-15T08:52:50ZINDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS Anthony Indonesia Final Project Transformer, Abstractive Summarization, PEGASUS, Deep Learning, Attention Mechanism INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76390 Summary is a presentation of writing in a shorter form and still contains important information from the writing. Reading articles along with summaries can help humans understand the contents of the article efficiently and effectively. However, compiling an abstract summary of an article requires time and human resources with skills above a certain threshold. This final project aims to create an abstractive summary program for Indonesian news with PEGASUS. PEGASUS is a deep learning model with transformer encoder-decoder architecture which is used to summarize text abstractly. Transformer encoder-decoder has an advantage in processing text data because it uses the attention mechanism, a data processing mechanism inspired by the process of human attention. A total of 5,670,868 Indonesian news articles without summaries and 357,499 Indonesian news articles with summaries from various sources were used to train the PEGASUS model. The PEGASUS model is trained with a special pre-training objective Gap Sentence Generation. The Gap Sentence Generation process helps the model to identify important sentences in the article. Model performance was quantitatively evaluated using the ROUGE-1, ROUGE-2 and ROUGE-L metrics. The PEGASUS model in this final project managed to achieve ROUGE-1 F1/ROUGE-2 F1/ROUGE-L F1 values of 52.43/41.23/48.18 on the Indo- Sum test dataset, 38.27/20.22/31.26 on the Liputan6 test dataset, 26.97/9.99/21.70 on XLSum test dataset. Meanwhile, the performance of the model is qualitatively evaluated by reviewing the sample summary articles in the dataset and the summary results of the latest news articles and non-news articles. The evaluation results show that the PEGASUS model succeeded in producing coherent and concise article summaries. Overall, the ability of the PEGASUS model to generate summaries is on par with human capabilities in general. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Summary is a presentation of writing in a shorter form and still contains important
information from the writing. Reading articles along with summaries can help humans
understand the contents of the article efficiently and effectively. However, compiling an
abstract summary of an article requires time and human resources with skills above a
certain threshold.
This final project aims to create an abstractive summary program for Indonesian news
with PEGASUS. PEGASUS is a deep learning model with transformer encoder-decoder
architecture which is used to summarize text abstractly. Transformer encoder-decoder
has an advantage in processing text data because it uses the attention mechanism, a
data processing mechanism inspired by the process of human attention.
A total of 5,670,868 Indonesian news articles without summaries and 357,499 Indonesian
news articles with summaries from various sources were used to train the PEGASUS
model. The PEGASUS model is trained with a special pre-training objective
Gap Sentence Generation. The Gap Sentence Generation process helps the model to
identify important sentences in the article.
Model performance was quantitatively evaluated using the ROUGE-1, ROUGE-2 and
ROUGE-L metrics. The PEGASUS model in this final project managed to achieve
ROUGE-1 F1/ROUGE-2 F1/ROUGE-L F1 values of 52.43/41.23/48.18 on the Indo-
Sum test dataset, 38.27/20.22/31.26 on the Liputan6 test dataset, 26.97/9.99/21.70 on
XLSum test dataset. Meanwhile, the performance of the model is qualitatively evaluated
by reviewing the sample summary articles in the dataset and the summary results
of the latest news articles and non-news articles. The evaluation results show that
the PEGASUS model succeeded in producing coherent and concise article summaries.
Overall, the ability of the PEGASUS model to generate summaries is on par with human capabilities in general. |
format |
Final Project |
author |
Anthony |
spellingShingle |
Anthony INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS |
author_facet |
Anthony |
author_sort |
Anthony |
title |
INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS |
title_short |
INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS |
title_full |
INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS |
title_fullStr |
INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS |
title_full_unstemmed |
INDONESIAN NEWS ABSTRACTIVE SUMMARIZATION USING PEGASUS |
title_sort |
indonesian news abstractive summarization using pegasus |
url |
https://digilib.itb.ac.id/gdl/view/76390 |
_version_ |
1822007969952301056 |