Automatic summarizer for web documents

As the world globalize, internet is being used around the world. This resulted in the web documents in texts, growing exponentially. It is not suitable to read through all the text information online and just to find and sieve out what you need. Using unsupervised clustering algorithms,...

Full description

Saved in:
Bibliographic Details
Main Author: Chia, Pei Qi
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/61087
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:As the world globalize, internet is being used around the world. This resulted in the web documents in texts, growing exponentially. It is not suitable to read through all the text information online and just to find and sieve out what you need. Using unsupervised clustering algorithms, the author had created an automatic summarizer that summarizes long documents into short summaries. This thesis will discuss various natural language processing techniques and data mining concepts that are used within the software with primary focus on Lemmatization. These allows the gathering of similar meaning words as well as clustering algorithms Hierarchical Agglomerative Clustering and K-means. The methodology is using the top down and incremental approach to design and build a reliable and functional summarizer. This thesis also explains the functionalities of the summarizer with different implemented tests for greater confidence. They are then observe and evaluate on its flexibility to different text inputs and the logicality of the output summaries. The thesis would then conclude with the suggestion of increasing the usage of natural language process to aid computers in the 'understanding' text information and the probably of using soft clustering approach. All in all, the objective of the project is met and the thesis provides the reader the necessary knowledge to develop a summarizer using the clustering process depicted.