Discovery of interesting phrases from text streams

The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text stre...

Full description

Saved in:
Bibliographic Details
Main Author: Pang, Jeffrey Jian Hao
Other Authors: Sun Aixin
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/46465
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text. In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”. This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets. The applications and future development potential are also discussed in this report. IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information.