ABSTRACTIVE SUMMARIZATION USING GENETIC SEMANTIC GRAPH FOR INDONESIAN NEWS ARTICLES
There are only a few abstractive summarization systems for Indonesian news articles that have been developed. One of the abstractive summarization systems for English news articles that has a good performance is a system based on genetic semantic graph. In this final project, an abstractive summa...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/39405 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | There are only a few abstractive summarization systems for Indonesian news
articles that have been developed. One of the abstractive summarization systems
for English news articles that has a good performance is a system based on genetic
semantic graph. In this final project, an abstractive summarization model for
Indonesian news articles is developed using predicate argument structure (PAS)
extraction and semantic graph whose features are weighted using weights from
the result of genetic algorithm (GA).
The summarization system based on genetic semantic graph has seven
components, three of the seven components are language dependent so they have
to be modified in order for them to work for Indonesian news articles. Those three
components are PAS extraction component that is replaced by modifying and
adding rules to the SVOA extraction component, semantic similarity matrix
component that is replaced with cosine similarity algorithm based on word
embedding, and abstractive summary generation component that is replaced with
heuristic rules.
Experiments are conducted to obtain the best pretrained word embedding, the best
mutation probability for genetic algorithm (GA), and the best selection, crossover,
and mutation algorithm combination for GA for each type of summary, i.e. 100
words and 200 words. The performance of the best summarization model that is
developed in this final project has been able to defeat the performance of the
previous abstractive summarization system with 0.320 and 0.394 average
ROUGE-2 recall for 100 words summary and 200 words summary respectively.
The best combination of GA is rank-based roulette wheel (RRW) selection,
simple crossover, and nonuniform mutation for 100 words summary and rankbased
stochastic universal sampling (RSUS) selection, simple crossover, and
random mutation for 200 words summary. |
---|