APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION

Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on den...

Full description

Saved in:
Bibliographic Details
Main Author: Setiawan Suryadjaja, Paulus
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/55435
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:55435
spelling id-itb.:554352021-06-17T17:15:50ZAPPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION Setiawan Suryadjaja, Paulus Indonesia Theses text summarization, density peaks clustering, sentence-BERT, topic modeling, cluster-based text summarization, extractive text summarization, DUC 2004 INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/55435 Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes as an improvement for the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This study also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research. When tested on the DUC 2004 dataset, the best configuration of the proposed method produces a summary with a better ROUGE-1 score of about 0.067 points than the summary generated by the previous method. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes as an improvement for the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This study also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research. When tested on the DUC 2004 dataset, the best configuration of the proposed method produces a summary with a better ROUGE-1 score of about 0.067 points than the summary generated by the previous method.
format Theses
author Setiawan Suryadjaja, Paulus
spellingShingle Setiawan Suryadjaja, Paulus
APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION
author_facet Setiawan Suryadjaja, Paulus
author_sort Setiawan Suryadjaja, Paulus
title APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION
title_short APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION
title_full APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION
title_fullStr APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION
title_full_unstemmed APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION
title_sort application of sentence-bert for improving density peaks clustering-based extractive text summarization
url https://digilib.itb.ac.id/gdl/view/55435
_version_ 1822002070607101952