ABSTRACTIVE SUMMARIZATION USING GENETIC SEMANTIC GRAPH FOR INDONESIAN NEWS ARTICLES

There are only a few abstractive summarization systems for Indonesian news articles that have been developed. One of the abstractive summarization systems for English news articles that has a good performance is a system based on genetic semantic graph. In this final project, an abstractive summa...

Full description

Saved in:
Bibliographic Details
Main Author: Sidney Devianti, Rachel
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/39405
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:There are only a few abstractive summarization systems for Indonesian news articles that have been developed. One of the abstractive summarization systems for English news articles that has a good performance is a system based on genetic semantic graph. In this final project, an abstractive summarization model for Indonesian news articles is developed using predicate argument structure (PAS) extraction and semantic graph whose features are weighted using weights from the result of genetic algorithm (GA). The summarization system based on genetic semantic graph has seven components, three of the seven components are language dependent so they have to be modified in order for them to work for Indonesian news articles. Those three components are PAS extraction component that is replaced by modifying and adding rules to the SVOA extraction component, semantic similarity matrix component that is replaced with cosine similarity algorithm based on word embedding, and abstractive summary generation component that is replaced with heuristic rules. Experiments are conducted to obtain the best pretrained word embedding, the best mutation probability for genetic algorithm (GA), and the best selection, crossover, and mutation algorithm combination for GA for each type of summary, i.e. 100 words and 200 words. The performance of the best summarization model that is developed in this final project has been able to defeat the performance of the previous abstractive summarization system with 0.320 and 0.394 average ROUGE-2 recall for 100 words summary and 200 words summary respectively. The best combination of GA is rank-based roulette wheel (RRW) selection, simple crossover, and nonuniform mutation for 100 words summary and rankbased stochastic universal sampling (RSUS) selection, simple crossover, and random mutation for 200 words summary.