Automatic document summarization

Text summarization, an important branch of Natural Language Processing (NLP), has attracted an increasingly amount of research and engineering interest due to the explosion of information nowadays. Currently, most summarization applications have been devoted to social media and structured reports, w...

Full description

Saved in:

Bibliographic Details
Main Author:	Xu, Hengjie
Other Authors:	Mao Kezhi
Format:	Final Year Project
Language:	English
Published:	2017
Subjects:	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	http://hdl.handle.net/10356/70900
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-70900
record_format	dspace
spelling	sg-ntu-dr.10356-709002023-07-07T16:09:39Z Automatic document summarization Xu, Hengjie Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Text summarization, an important branch of Natural Language Processing (NLP), has attracted an increasingly amount of research and engineering interest due to the explosion of information nowadays. Currently, most summarization applications have been devoted to social media and structured reports, with little attention paid to news-article analytics. This project aims to achieve automatic text summarization of a vast number of news articles using a few key sentences. It is a pipelined system consisting of text representation models and clustering algorithms (with cluster centroids as key sentences). 8 summarization techniques were evaluated both on the article level and sentence level. After research, we choose Bag of Words (BoW) with Latent Semantic Analysis (LSA) and Spherical K-Means as this combination stands out among all the 8 combinations. In particular, on the article level, the combination produces a score of 0.94, a 17.5% boost compared to our baseline from literature. It reflects that our proposed clustering technique is fairly robust and accurate. This project is consolidated into a single web application. The user interface allows users to obtain relevant news articles based on their input, such as subject names, date range and sources. For subsequent analysis of these news articles, Named Entity Recognition (NER) algorithm is refined and applied to extract major entities, such as places, person and organizations, as preliminary analysis. Eventually, news articles are summarized with sentences using our optimal model of summarization. Bachelor of Engineering 2017-05-12T03:21:13Z 2017-05-12T03:21:13Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/70900 en Nanyang Technological University 68 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering::Computer hardware, software and systems Xu, Hengjie Automatic document summarization
description	Text summarization, an important branch of Natural Language Processing (NLP), has attracted an increasingly amount of research and engineering interest due to the explosion of information nowadays. Currently, most summarization applications have been devoted to social media and structured reports, with little attention paid to news-article analytics. This project aims to achieve automatic text summarization of a vast number of news articles using a few key sentences. It is a pipelined system consisting of text representation models and clustering algorithms (with cluster centroids as key sentences). 8 summarization techniques were evaluated both on the article level and sentence level. After research, we choose Bag of Words (BoW) with Latent Semantic Analysis (LSA) and Spherical K-Means as this combination stands out among all the 8 combinations. In particular, on the article level, the combination produces a score of 0.94, a 17.5% boost compared to our baseline from literature. It reflects that our proposed clustering technique is fairly robust and accurate. This project is consolidated into a single web application. The user interface allows users to obtain relevant news articles based on their input, such as subject names, date range and sources. For subsequent analysis of these news articles, Named Entity Recognition (NER) algorithm is refined and applied to extract major entities, such as places, person and organizations, as preliminary analysis. Eventually, news articles are summarized with sentences using our optimal model of summarization.
author2	Mao Kezhi
author_facet	Mao Kezhi Xu, Hengjie
format	Final Year Project
author	Xu, Hengjie
author_sort	Xu, Hengjie
title	Automatic document summarization
title_short	Automatic document summarization
title_full	Automatic document summarization
title_fullStr	Automatic document summarization
title_full_unstemmed	Automatic document summarization
title_sort	automatic document summarization
publishDate	2017
url	http://hdl.handle.net/10356/70900
_version_	1772825522894536704

Automatic document summarization

Similar Items