Discovery of interesting phrases from text streams

The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text stre...

Full description

Saved in:
Bibliographic Details
Main Author: Pang, Jeffrey Jian Hao
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/46465
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-46465
record_format dspace
spelling sg-ntu-dr.10356-464652023-03-03T20:29:13Z Discovery of interesting phrases from text streams Pang, Jeffrey Jian Hao Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text. In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”. This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets. The applications and future development potential are also discussed in this report. IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information. Bachelor of Engineering (Computer Science) 2011-12-06T04:55:23Z 2011-12-06T04:55:23Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/46465 en Nanyang Technological University 63 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
Pang, Jeffrey Jian Hao
Discovery of interesting phrases from text streams
description The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text. In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”. This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets. The applications and future development potential are also discussed in this report. IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information.
author2 Sun Aixin
author_facet Sun Aixin
Pang, Jeffrey Jian Hao
format Final Year Project
author Pang, Jeffrey Jian Hao
author_sort Pang, Jeffrey Jian Hao
title Discovery of interesting phrases from text streams
title_short Discovery of interesting phrases from text streams
title_full Discovery of interesting phrases from text streams
title_fullStr Discovery of interesting phrases from text streams
title_full_unstemmed Discovery of interesting phrases from text streams
title_sort discovery of interesting phrases from text streams
publishDate 2011
url http://hdl.handle.net/10356/46465
_version_ 1759854604774801408