Automatic summarizer for web documents

As the world globalize, internet is being used around the world. This resulted in the web documents in texts, growing exponentially. It is not suitable to read through all the text information online and just to find and sieve out what you need. Using unsupervised clustering algorithms,...

Full description

Saved in:
Bibliographic Details
Main Author: Chia, Pei Qi
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/61087
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-61087
record_format dspace
spelling sg-ntu-dr.10356-610872023-07-07T17:09:44Z Automatic summarizer for web documents Chia, Pei Qi Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering As the world globalize, internet is being used around the world. This resulted in the web documents in texts, growing exponentially. It is not suitable to read through all the text information online and just to find and sieve out what you need. Using unsupervised clustering algorithms, the author had created an automatic summarizer that summarizes long documents into short summaries. This thesis will discuss various natural language processing techniques and data mining concepts that are used within the software with primary focus on Lemmatization. These allows the gathering of similar meaning words as well as clustering algorithms Hierarchical Agglomerative Clustering and K-means. The methodology is using the top down and incremental approach to design and build a reliable and functional summarizer. This thesis also explains the functionalities of the summarizer with different implemented tests for greater confidence. They are then observe and evaluate on its flexibility to different text inputs and the logicality of the output summaries. The thesis would then conclude with the suggestion of increasing the usage of natural language process to aid computers in the 'understanding' text information and the probably of using soft clustering approach. All in all, the objective of the project is met and the thesis provides the reader the necessary knowledge to develop a summarizer using the clustering process depicted. Bachelor of Engineering 2014-06-04T08:04:22Z 2014-06-04T08:04:22Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/61087 en Nanyang Technological University 77 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering
spellingShingle DRNTU::Engineering
Chia, Pei Qi
Automatic summarizer for web documents
description As the world globalize, internet is being used around the world. This resulted in the web documents in texts, growing exponentially. It is not suitable to read through all the text information online and just to find and sieve out what you need. Using unsupervised clustering algorithms, the author had created an automatic summarizer that summarizes long documents into short summaries. This thesis will discuss various natural language processing techniques and data mining concepts that are used within the software with primary focus on Lemmatization. These allows the gathering of similar meaning words as well as clustering algorithms Hierarchical Agglomerative Clustering and K-means. The methodology is using the top down and incremental approach to design and build a reliable and functional summarizer. This thesis also explains the functionalities of the summarizer with different implemented tests for greater confidence. They are then observe and evaluate on its flexibility to different text inputs and the logicality of the output summaries. The thesis would then conclude with the suggestion of increasing the usage of natural language process to aid computers in the 'understanding' text information and the probably of using soft clustering approach. All in all, the objective of the project is met and the thesis provides the reader the necessary knowledge to develop a summarizer using the clustering process depicted.
author2 Mao Kezhi
author_facet Mao Kezhi
Chia, Pei Qi
format Final Year Project
author Chia, Pei Qi
author_sort Chia, Pei Qi
title Automatic summarizer for web documents
title_short Automatic summarizer for web documents
title_full Automatic summarizer for web documents
title_fullStr Automatic summarizer for web documents
title_full_unstemmed Automatic summarizer for web documents
title_sort automatic summarizer for web documents
publishDate 2014
url http://hdl.handle.net/10356/61087
_version_ 1772825618360041472