MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
Automatic summarization system for Indonesian news articles needs to be more developed, along with the increasing amount of news on the internet. Extractive summarization system for Indonesian news articles was previously developed using semantic role labeling (SRL) to produce predicate argument...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/56248 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:56248 |
---|---|
spelling |
id-itb.:562482021-06-21T16:40:13ZMULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE Yumna Khairunnisa, Nisrina Indonesia Final Project PAS-to-document feature, PAS-to-document set feature, linear regression, sentence fusion. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/56248 Automatic summarization system for Indonesian news articles needs to be more developed, along with the increasing amount of news on the internet. Extractive summarization system for Indonesian news articles was previously developed using semantic role labeling (SRL) to produce predicate argument structure (PAS) and decision tree model to predict sentence’s salience score. However, sentence label inconsistencies was found in the training dataset of decision tree. As an alternative to decision tree, linear regression trained with sentence ROUGE score against reference summary as target can be used. The training dataset can be annotated automatically. In addition, sentence fusion based summarization system for Indonesian news article was developed to produce semi-abstractive summary. In this thesis, the impact of PAS-to-document and PAS-to-document set features and also linear regression trained with automatically annotated data on the SRL and semantic graph based summarization system will be investigated. In addition, the effect of sentence fusion on the quality of the summary will also be examined. This final project summarization system is developed with decision tree or linear regression to predict sentence’s salience score and sentence fusion. Decision tree is trained with some additional manual annotated data. Linear regression is trained with automatically annotated data based on ROUGE score. Those two models will use 13 features from PAS-to-document and PAS-to-document set relationship. Sentence fusion generates new sentences from group of similar sentences based on result of clustering. Experiment aims to investigate the impact of increasing data size for decision tree, determine the best model to predict sentence’s salience score, determine the best linkage configuration for PAS to document similarity and PAS to document set similarity feature, determine the optimal feature set, determine the effect of title feature, and determine clustering parameter. The best model get average ROUGE2 recall of 0.2471 and 0.3026 for summary of 100 and 200 words, respectively. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Automatic summarization system for Indonesian news articles needs to be more
developed, along with the increasing amount of news on the internet. Extractive
summarization system for Indonesian news articles was previously developed using
semantic role labeling (SRL) to produce predicate argument structure (PAS) and
decision tree model to predict sentence’s salience score. However, sentence label
inconsistencies was found in the training dataset of decision tree. As an alternative
to decision tree, linear regression trained with sentence ROUGE score against
reference summary as target can be used. The training dataset can be annotated
automatically. In addition, sentence fusion based summarization system for
Indonesian news article was developed to produce semi-abstractive summary. In
this thesis, the impact of PAS-to-document and PAS-to-document set features and
also linear regression trained with automatically annotated data on the SRL and
semantic graph based summarization system will be investigated. In addition, the
effect of sentence fusion on the quality of the summary will also be examined.
This final project summarization system is developed with decision tree or linear
regression to predict sentence’s salience score and sentence fusion. Decision tree is
trained with some additional manual annotated data. Linear regression is trained
with automatically annotated data based on ROUGE score. Those two models will
use 13 features from PAS-to-document and PAS-to-document set relationship.
Sentence fusion generates new sentences from group of similar sentences based on
result of clustering.
Experiment aims to investigate the impact of increasing data size for decision tree,
determine the best model to predict sentence’s salience score, determine the best
linkage configuration for PAS to document similarity and PAS to document set
similarity feature, determine the optimal feature set, determine the effect of title
feature, and determine clustering parameter. The best model get average ROUGE2 recall of 0.2471 and 0.3026 for summary of 100 and 200 words, respectively. |
format |
Final Project |
author |
Yumna Khairunnisa, Nisrina |
spellingShingle |
Yumna Khairunnisa, Nisrina MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE |
author_facet |
Yumna Khairunnisa, Nisrina |
author_sort |
Yumna Khairunnisa, Nisrina |
title |
MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE |
title_short |
MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE |
title_full |
MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE |
title_fullStr |
MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE |
title_full_unstemmed |
MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE |
title_sort |
multi-document summarization using semantic role labeling and linear regression for indonesian news article |
url |
https://digilib.itb.ac.id/gdl/view/56248 |
_version_ |
1822930141360160768 |