Discovery of interesting phrases from text streams
The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text stre...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/46465 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-46465 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-464652023-03-03T20:29:13Z Discovery of interesting phrases from text streams Pang, Jeffrey Jian Hao Sun Aixin School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text. In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”. This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets. The applications and future development potential are also discussed in this report. IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information. Bachelor of Engineering (Computer Science) 2011-12-06T04:55:23Z 2011-12-06T04:55:23Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/46465 en Nanyang Technological University 63 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Pang, Jeffrey Jian Hao Discovery of interesting phrases from text streams |
description |
The fast adoption of blogs and tweets in recent years has been generating a large and diversified amount of information feeds daily. In order to take advantage of this vast knowledge, there is a need to automatically and efficiently organize these timely data into useful information. These text streams usually contain interesting phrases that provide summarized insights of the content of the text.
In this project, we are interested in extracting interesting phrases, consolidating them and transforming them into meaningful statistics such as the amount of media coverage of a certain event during a specific time period, by making use of their temporal information such as “date published”.
This report explores the various methodologies and algorithms used in keyphrase extraction. It also documents the development and implementation of a search engine titled “Interesting Phrases Analysis Program (IPAP)” designed for this project. IPAP is capable of retrieving interesting phrases from large collection of blog entries. It indexes and allows users to perform a series of different useful analysis on the search result. The trend of phrases, relationship between phrases, niche of each blog and other handy information can be obtained from the analysis. It can also be developed to use with tweets.
The applications and future development potential are also discussed in this report.
IPAP proves that the analysis of interesting phrases from text stream such as blog can generate unexpectedly large amount of beneficial information. |
author2 |
Sun Aixin |
author_facet |
Sun Aixin Pang, Jeffrey Jian Hao |
format |
Final Year Project |
author |
Pang, Jeffrey Jian Hao |
author_sort |
Pang, Jeffrey Jian Hao |
title |
Discovery of interesting phrases from text streams |
title_short |
Discovery of interesting phrases from text streams |
title_full |
Discovery of interesting phrases from text streams |
title_fullStr |
Discovery of interesting phrases from text streams |
title_full_unstemmed |
Discovery of interesting phrases from text streams |
title_sort |
discovery of interesting phrases from text streams |
publishDate |
2011 |
url |
http://hdl.handle.net/10356/46465 |
_version_ |
1759854604774801408 |