Topical analysis of text streams

Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking...

Full description

Saved in:
Bibliographic Details
Main Author: He, Qi
Other Authors: Lim Ee Peng
Format: Theses and Dissertations
Language:English
Published: 2009
Subjects:
Online Access:https://hdl.handle.net/10356/17764
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-17764
record_format dspace
spelling sg-ntu-dr.10356-177642023-03-04T00:42:40Z Topical analysis of text streams He, Qi Lim Ee Peng Chang Kuiyu School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking, etc. However, specialists who analyze news articles have a hard time separating the wheat from the chaff, due to the overwhelming amount of news streams (over 10,000 as of 2008). For many years, Topic Detection has been tackled as a clustering task by the TDT (Topic Detection and Tracking) research community. However, time, which plays a pivotal role in news articles has never been given due consideration in the past. In this research we present a thorough study on various temporal topic detection models that explicitly incorporate the element of time. We further discovered that bursty temporal word features play an important role in improving topic detection performance, and ventured to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, we proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. Our algorithms are mathematically elegant, simple, and extremely practical, when benchmarked against some of the best topic detection models including spherical k-means, Latent Dirichlet Allocation (LDA), and von-Mises Fisher mixtures. Finally, we present a case study of a personalized news alert application, where subscribers can specify interesting anticipatory events, and show how a simple supervised event transition classifier can be used to effectively identify user anticipated events. Our research is one of the most comprehensive studies on both offline and online topic detection, of which the latter has been an open research problem for many years. In fact, our online topic detection model can be viewed as a significant advancement in the field, which paves the way for further improvements by other TDT experts. DOCTOR OF PHILOSOPHY (SCE) 2009-06-15T01:27:56Z 2009-06-15T01:27:56Z 2009 2009 Thesis He, Q. (2009). Topical analysis of text streams. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/17764 10.32657/10356/17764 en 200 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
spellingShingle DRNTU::Engineering::Computer science and engineering::Computing methodologies::Document and text processing
He, Qi
Topical analysis of text streams
description Topic detection (TD) is an important area of research whose primary goal is to detect retrospective or new topics from a stream of news articles. It could be extremely useful in many applications including news aggregation portals, news alert systems, event search engine, terrorist activity tracking, etc. However, specialists who analyze news articles have a hard time separating the wheat from the chaff, due to the overwhelming amount of news streams (over 10,000 as of 2008). For many years, Topic Detection has been tackled as a clustering task by the TDT (Topic Detection and Tracking) research community. However, time, which plays a pivotal role in news articles has never been given due consideration in the past. In this research we present a thorough study on various temporal topic detection models that explicitly incorporate the element of time. We further discovered that bursty temporal word features play an important role in improving topic detection performance, and ventured to provide an in-depth analysis and systematic categorization of all word features into 5 general types using techniques from signal processing. Armed with a small set of extracted bursty features from historical or online news streams, we proposed a number of effective algorithms to detect topics from a news stream in both offline and online modes. Our algorithms are mathematically elegant, simple, and extremely practical, when benchmarked against some of the best topic detection models including spherical k-means, Latent Dirichlet Allocation (LDA), and von-Mises Fisher mixtures. Finally, we present a case study of a personalized news alert application, where subscribers can specify interesting anticipatory events, and show how a simple supervised event transition classifier can be used to effectively identify user anticipated events. Our research is one of the most comprehensive studies on both offline and online topic detection, of which the latter has been an open research problem for many years. In fact, our online topic detection model can be viewed as a significant advancement in the field, which paves the way for further improvements by other TDT experts.
author2 Lim Ee Peng
author_facet Lim Ee Peng
He, Qi
format Theses and Dissertations
author He, Qi
author_sort He, Qi
title Topical analysis of text streams
title_short Topical analysis of text streams
title_full Topical analysis of text streams
title_fullStr Topical analysis of text streams
title_full_unstemmed Topical analysis of text streams
title_sort topical analysis of text streams
publishDate 2009
url https://hdl.handle.net/10356/17764
_version_ 1759854421613740032