MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE

Automatic summarization system for Indonesian news articles needs to be more developed, along with the increasing amount of news on the internet. Extractive summarization system for Indonesian news articles was previously developed using semantic role labeling (SRL) to produce predicate argument...

Full description

Saved in:
Bibliographic Details
Main Author: Yumna Khairunnisa, Nisrina
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/56248
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:56248
spelling id-itb.:562482021-06-21T16:40:13ZMULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE Yumna Khairunnisa, Nisrina Indonesia Final Project PAS-to-document feature, PAS-to-document set feature, linear regression, sentence fusion. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/56248 Automatic summarization system for Indonesian news articles needs to be more developed, along with the increasing amount of news on the internet. Extractive summarization system for Indonesian news articles was previously developed using semantic role labeling (SRL) to produce predicate argument structure (PAS) and decision tree model to predict sentence’s salience score. However, sentence label inconsistencies was found in the training dataset of decision tree. As an alternative to decision tree, linear regression trained with sentence ROUGE score against reference summary as target can be used. The training dataset can be annotated automatically. In addition, sentence fusion based summarization system for Indonesian news article was developed to produce semi-abstractive summary. In this thesis, the impact of PAS-to-document and PAS-to-document set features and also linear regression trained with automatically annotated data on the SRL and semantic graph based summarization system will be investigated. In addition, the effect of sentence fusion on the quality of the summary will also be examined. This final project summarization system is developed with decision tree or linear regression to predict sentence’s salience score and sentence fusion. Decision tree is trained with some additional manual annotated data. Linear regression is trained with automatically annotated data based on ROUGE score. Those two models will use 13 features from PAS-to-document and PAS-to-document set relationship. Sentence fusion generates new sentences from group of similar sentences based on result of clustering. Experiment aims to investigate the impact of increasing data size for decision tree, determine the best model to predict sentence’s salience score, determine the best linkage configuration for PAS to document similarity and PAS to document set similarity feature, determine the optimal feature set, determine the effect of title feature, and determine clustering parameter. The best model get average ROUGE2 recall of 0.2471 and 0.3026 for summary of 100 and 200 words, respectively. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Automatic summarization system for Indonesian news articles needs to be more developed, along with the increasing amount of news on the internet. Extractive summarization system for Indonesian news articles was previously developed using semantic role labeling (SRL) to produce predicate argument structure (PAS) and decision tree model to predict sentence’s salience score. However, sentence label inconsistencies was found in the training dataset of decision tree. As an alternative to decision tree, linear regression trained with sentence ROUGE score against reference summary as target can be used. The training dataset can be annotated automatically. In addition, sentence fusion based summarization system for Indonesian news article was developed to produce semi-abstractive summary. In this thesis, the impact of PAS-to-document and PAS-to-document set features and also linear regression trained with automatically annotated data on the SRL and semantic graph based summarization system will be investigated. In addition, the effect of sentence fusion on the quality of the summary will also be examined. This final project summarization system is developed with decision tree or linear regression to predict sentence’s salience score and sentence fusion. Decision tree is trained with some additional manual annotated data. Linear regression is trained with automatically annotated data based on ROUGE score. Those two models will use 13 features from PAS-to-document and PAS-to-document set relationship. Sentence fusion generates new sentences from group of similar sentences based on result of clustering. Experiment aims to investigate the impact of increasing data size for decision tree, determine the best model to predict sentence’s salience score, determine the best linkage configuration for PAS to document similarity and PAS to document set similarity feature, determine the optimal feature set, determine the effect of title feature, and determine clustering parameter. The best model get average ROUGE2 recall of 0.2471 and 0.3026 for summary of 100 and 200 words, respectively.
format Final Project
author Yumna Khairunnisa, Nisrina
spellingShingle Yumna Khairunnisa, Nisrina
MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
author_facet Yumna Khairunnisa, Nisrina
author_sort Yumna Khairunnisa, Nisrina
title MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
title_short MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
title_full MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
title_fullStr MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
title_full_unstemmed MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
title_sort multi-document summarization using semantic role labeling and linear regression for indonesian news article
url https://digilib.itb.ac.id/gdl/view/56248
_version_ 1822930141360160768