Automatic summarization of web documents

Nowadays, we face an information overload, with all the rapid development in R& D and technological advancement. Even though information overload means that we can have various information regarding a specific topic, but it start to became more difficult to retrieve all the information needed in...

Full description

Saved in:
Bibliographic Details
Main Author: Jodihardja, Marcellus Reinaldo
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/10356/54319
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-54319
record_format dspace
spelling sg-ntu-dr.10356-543192023-07-07T17:01:38Z Automatic summarization of web documents Jodihardja, Marcellus Reinaldo Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering Nowadays, we face an information overload, with all the rapid development in R& D and technological advancement. Even though information overload means that we can have various information regarding a specific topic, but it start to became more difficult to retrieve all the information needed in a limited time. The objective of this project is to create an auto-summarization program that can create a good summary of some documents in matter of seconds. By having this program, hopefully we can have all the information needed that are encapsulated in a dense and compact document. Latent Semantic Analysis is chosen to be the fundamental concept of this auto-summarization program. Thus, TFIDF (Term Frequency – Inverse Document Frequency) is utilized to give value of importance for each term, and Singular Value Decomposition is used to select the best sentences that can represent all information in a document. Some modifications have also been applied onto the algorithm in order to increase the efficiency and reduce the complexity time of this program. Furthermore, “meta” summarization method has also been implemented, to create a summary from some summaries that have been created from some input documents. This project successfully implemented all the algorithms needed and thus creating a good summary based on some input documents. Bachelor of Engineering 2013-06-19T02:57:16Z 2013-06-19T02:57:16Z 2013 2013 Final Year Project (FYP) http://hdl.handle.net/10356/54319 en Nanyang Technological University 57 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
spellingShingle DRNTU::Engineering
Jodihardja, Marcellus Reinaldo
Automatic summarization of web documents
description Nowadays, we face an information overload, with all the rapid development in R& D and technological advancement. Even though information overload means that we can have various information regarding a specific topic, but it start to became more difficult to retrieve all the information needed in a limited time. The objective of this project is to create an auto-summarization program that can create a good summary of some documents in matter of seconds. By having this program, hopefully we can have all the information needed that are encapsulated in a dense and compact document. Latent Semantic Analysis is chosen to be the fundamental concept of this auto-summarization program. Thus, TFIDF (Term Frequency – Inverse Document Frequency) is utilized to give value of importance for each term, and Singular Value Decomposition is used to select the best sentences that can represent all information in a document. Some modifications have also been applied onto the algorithm in order to increase the efficiency and reduce the complexity time of this program. Furthermore, “meta” summarization method has also been implemented, to create a summary from some summaries that have been created from some input documents. This project successfully implemented all the algorithms needed and thus creating a good summary based on some input documents.
author2 Mao Kezhi
author_facet Mao Kezhi
Jodihardja, Marcellus Reinaldo
format Final Year Project
author Jodihardja, Marcellus Reinaldo
author_sort Jodihardja, Marcellus Reinaldo
title Automatic summarization of web documents
title_short Automatic summarization of web documents
title_full Automatic summarization of web documents
title_fullStr Automatic summarization of web documents
title_full_unstemmed Automatic summarization of web documents
title_sort automatic summarization of web documents
publishDate 2013
url http://hdl.handle.net/10356/54319
_version_ 1772827871949094912