AUTOMATED SUMMARIZATION FOR INDONESIAN NEWS ARTICLE USING ABSTRACT MEANING REPRESENTATION
Along with the growth of online news sources, summaries have become in needs to obtain important information in shorter reading times. Summarization with Abstract Meaning Representation (AMR) has been done for the first time for Indonesian by using a rule-based AMR parser. Thus, the said AMR p...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/55526 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Along with the growth of online news sources, summaries have become in needs to
obtain important information in shorter reading times. Summarization with
Abstract Meaning Representation (AMR) has been done for the first time for
Indonesian by using a rule-based AMR parser. Thus, the said AMR parser has
limitations by generating nodes with phrases that cause problems in the concept
merging process of summarization system.
In this research, a machine learning-based AMR parser for Indonesian is used to
represent news article sentences from the IndoSum dataset. This AMR parser only
generates nodes with words. The concepts from generated AMR graph then would
be combined based on the same word and synonyms to form a source graph. The
source graph is then selected into subgraphs (also called summary graph) which
would be generated into a word set using Simple Natural Language Generation
(Simple NLG). From the word set, the system will extract three sentences of news
articles based on the highest score of the matching words normalized to sentence
length. The data used for this research is IndoSum dataset.
From the research results, it is proven that AMR generated by machine learningbased AMR parser can go through the process of concepts merging really well. As
a baseline, the extraction of the top three most similar news article sentences is
carried out based on cosine similarity. The representation used is Word2Vec which
has been retrained. The proposed system still has not exceeded the baseline. From
the analysis carried out, it appears that the system tends to choose the node whose
original word is in the initial sentence. |
---|