APPLICATION OF SENTENCE-BERT FOR IMPROVING DENSITY PEAKS CLUSTERING-BASED EXTRACTIVE TEXT SUMMARIZATION

Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on den...

Full description

Saved in:
Bibliographic Details
Main Author: Setiawan Suryadjaja, Paulus
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/55435
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Given the limitations of human reading abilities and the massive amount of text data available in modern times, there is a need for an automatic text summarization system. One automatic text summarization method that produces a satisfactory summary is extractive text summarization based on density peaks clustering. Previous research that applied this method has become state-of-the-art for the DUC 2004 dataset. However, there is still an opportunity for further development, specifically by applying the artificial neural network-based sentence embedding technique to replace the embedding vector space model and LDA topic modeling that was previously used. This research proposes a cluster-based automatic text summarization system using Sentence-BERT (SBERT) to perform sentence embedding and topic modeling processes as an improvement for the summarization technique proposed by previous research. SBERT was chosen because it has state-of-the-art performance on sentence embedding tasks, so it is expected to represent the semantic meaning of sentences better than the techniques used in previous studies. This research is the first research that applied SBERT for text summarization. This study also proposes several improvements for the sentence selection techniques used in previous studies. Based on the assessment using the ROUGE toolkit, the text summarization system built in this study succeeded in creating a better summary than the previous research. When tested on the DUC 2004 dataset, the best configuration of the proposed method produces a summary with a better ROUGE-1 score of about 0.067 points than the summary generated by the previous method.