MULTI-DOCUMENT SUMMARIZATION USING SEMANTIC ROLE LABELING AND LINEAR REGRESSION FOR INDONESIAN NEWS ARTICLE
Automatic summarization system for Indonesian news articles needs to be more developed, along with the increasing amount of news on the internet. Extractive summarization system for Indonesian news articles was previously developed using semantic role labeling (SRL) to produce predicate argument...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/56248 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Automatic summarization system for Indonesian news articles needs to be more
developed, along with the increasing amount of news on the internet. Extractive
summarization system for Indonesian news articles was previously developed using
semantic role labeling (SRL) to produce predicate argument structure (PAS) and
decision tree model to predict sentence’s salience score. However, sentence label
inconsistencies was found in the training dataset of decision tree. As an alternative
to decision tree, linear regression trained with sentence ROUGE score against
reference summary as target can be used. The training dataset can be annotated
automatically. In addition, sentence fusion based summarization system for
Indonesian news article was developed to produce semi-abstractive summary. In
this thesis, the impact of PAS-to-document and PAS-to-document set features and
also linear regression trained with automatically annotated data on the SRL and
semantic graph based summarization system will be investigated. In addition, the
effect of sentence fusion on the quality of the summary will also be examined.
This final project summarization system is developed with decision tree or linear
regression to predict sentence’s salience score and sentence fusion. Decision tree is
trained with some additional manual annotated data. Linear regression is trained
with automatically annotated data based on ROUGE score. Those two models will
use 13 features from PAS-to-document and PAS-to-document set relationship.
Sentence fusion generates new sentences from group of similar sentences based on
result of clustering.
Experiment aims to investigate the impact of increasing data size for decision tree,
determine the best model to predict sentence’s salience score, determine the best
linkage configuration for PAS to document similarity and PAS to document set
similarity feature, determine the optimal feature set, determine the effect of title
feature, and determine clustering parameter. The best model get average ROUGE2 recall of 0.2471 and 0.3026 for summary of 100 and 200 words, respectively. |
---|