APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION
Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on den...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/55435 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:55435 |
---|---|
spelling |
id-itb.:554352021-06-17T17:15:50ZAPPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION Setiawan Suryadjaja, Paulus Indonesia Theses text summarization, density peaks clustering, sentence-BERT, topic modeling, cluster-based text summarization, extractive text summarization, DUC 2004 INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/55435 Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes as an improvement for the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This study also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research. When tested on the DUC 2004 dataset, the best configuration of the proposed method produces a summary with a better ROUGE-1 score of about 0.067 points than the summary generated by the previous method. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Given the limitations of human reading abilities and the massive amount of text
data available in modern times, there is a need for an automatic text
summarization system. One automatic text summarization method that produces a
satisfactory summary is extractive text summarization based on density peaks
clustering. Previous research that applied this method has become state-of-the-art
for the DUC 2004 dataset. However, there is still an opportunity for further
development, specifically by applying the artificial neural network-based sentence
embedding technique to replace the embedding vector space model and LDA topic
modeling that was previously used. This research proposes a cluster-based
automatic text summarization system using Sentence-BERT (SBERT) to perform
sentence embedding and topic modeling processes as an improvement for the
summarization technique proposed by previous research. SBERT was chosen
because it has state-of-the-art performance on sentence embedding tasks, so it is
expected to represent the semantic meaning of sentences better than the
techniques used in previous studies. This research is the first research that
applied SBERT for text summarization. This study also proposes several
improvements for the sentence selection techniques used in previous studies.
Based on the assessment using the ROUGE toolkit, the text summarization system
built in this study succeeded in creating a better summary than the previous
research. When tested on the DUC 2004 dataset, the best configuration of the
proposed method produces a summary with a better ROUGE-1 score of about
0.067 points than the summary generated by the previous method. |
format |
Theses |
author |
Setiawan Suryadjaja, Paulus |
spellingShingle |
Setiawan Suryadjaja, Paulus APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION |
author_facet |
Setiawan Suryadjaja, Paulus |
author_sort |
Setiawan Suryadjaja, Paulus |
title |
APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION |
title_short |
APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION |
title_full |
APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION |
title_fullStr |
APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION |
title_full_unstemmed |
APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION |
title_sort |
application of sentence-bert for improving density peaks clustering-based extractive text summarization |
url |
https://digilib.itb.ac.id/gdl/view/55435 |
_version_ |
1822002070607101952 |