WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION

<p align="justify">Nowadays, the flow of demand and supply of publicly available information is very large. There is also abundance of online news websites that regularly post similar new information regarding certain topics. This causes many reoccuring information duplicate, thus mo...

Full description

Saved in:

Bibliographic Details
Main Author:	CHRISTIE - NIM: 23516083 , FELICIA
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/27271
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:27271
spelling	id-itb.:272712018-09-28T09:08:21ZWORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION CHRISTIE - NIM: 23516083 , FELICIA Indonesia Theses INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/27271 <p align="justify">Nowadays, the flow of demand and supply of publicly available information is very large. There is also abundance of online news websites that regularly post similar new information regarding certain topics. This causes many reoccuring information duplicate, thus more time is needed to process all relevant data regarding a topic. This leads to need of summarization systems as an alternative to reduce the processing time. <br /> <br /> This thesis discusses a summarization system with minimal dependencies to natural language processing resources, in which we design a minimal-dependency system by only using Indonesian POS-Tagger, word embedding models from unsupervised learning, and list of Indonesian stopwords. Our method consists of seven main steps to create a summary, including tokenization, POS-Tagging, term weighting with TF-IDF and word embedding, clustering, sentence fusion by word graphs, extracting said sentences, and finally sentence selection with integer linear programming algorithm. Evaluation is conducted with ROUGE 2, with mainly focusing on ROUGE-1 and ROUGE-2. <br /> <br /> By using several datasets for tuning, we obtain the optimal configuration which will be used on 5 test sets. From the experiments, we obtain the best score with Indonesian Word2Vec model for term weighting on clustering. At last, we obtain ROUGE-2 value of 0.231 for 100-word documents in average, and 0.319 for 200word documents in average. <p align="justify"> text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	<p align="justify">Nowadays, the flow of demand and supply of publicly available information is very large. There is also abundance of online news websites that regularly post similar new information regarding certain topics. This causes many reoccuring information duplicate, thus more time is needed to process all relevant data regarding a topic. This leads to need of summarization systems as an alternative to reduce the processing time. <br /> <br /> This thesis discusses a summarization system with minimal dependencies to natural language processing resources, in which we design a minimal-dependency system by only using Indonesian POS-Tagger, word embedding models from unsupervised learning, and list of Indonesian stopwords. Our method consists of seven main steps to create a summary, including tokenization, POS-Tagging, term weighting with TF-IDF and word embedding, clustering, sentence fusion by word graphs, extracting said sentences, and finally sentence selection with integer linear programming algorithm. Evaluation is conducted with ROUGE 2, with mainly focusing on ROUGE-1 and ROUGE-2. <br /> <br /> By using several datasets for tuning, we obtain the optimal configuration which will be used on 5 test sets. From the experiments, we obtain the best score with Indonesian Word2Vec model for term weighting on clustering. At last, we obtain ROUGE-2 value of 0.231 for 100-word documents in average, and 0.319 for 200word documents in average. <p align="justify">
format	Theses
author	CHRISTIE - NIM: 23516083 , FELICIA
spellingShingle	CHRISTIE - NIM: 23516083 , FELICIA WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION
author_facet	CHRISTIE - NIM: 23516083 , FELICIA
author_sort	CHRISTIE - NIM: 23516083 , FELICIA
title	WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION
title_short	WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION
title_full	WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION
title_fullStr	WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION
title_full_unstemmed	WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION
title_sort	word embedding in multi-document news summarization using sentence fusion
url	https://digilib.itb.ac.id/gdl/view/27271
_version_	1821934329790464000

WORD EMBEDDING IN MULTI-DOCUMENT NEWS SUMMARIZATION USING SENTENCE FUSION

Similar Items