Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/106850 http://hdl.handle.net/10220/49683 https://doi.org/10.32657/10220/49683 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | With the wide-spread usage of social media such as Facebook and Twitter, large amount
of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis
considers a specific type of topic model computational problem called topic-range queries,
where the topic model of interest is restricted to the data records that fall within a dynamically specified geographic region and time period. To achieve this purpose, one naive
approach is to directly apply a range query to retrieve the data items falling within the
specified spatio-temporal range, then derive the topic model from the retrieved data by
using a known algorithm such as LDA (Latent Dirichlet Allocation). When dealing with
large volume of data, however, the two-step naive approach could each incur substantial amount of time. Novel algorithms for expediting the topic-range queries have been
designed, including the fast topic combining algorithm FSS (Fast Set Sampling) which
indexes the dataset with a tree, and pre-compute the topic model of the subset of data
associated with each node of the tree. To answer a topic-range query, the tree nodes
covered by the range query are identified, and the pre-computed topic models associated
with these tree nodes are merged to produce an approximate result. Compared to the
nave approach, this approximation of topic model substantially can reduce runtime. In
the original design of the FSS algorithm, Cube trees are used as the indexing structure
to support spatio-temporal range queries. In the literature, however, Range Trees offer
a better worst-case query time guarantee for a range query. This master thesis thus
considers a new combination of Range Trees and FSS (called Topic Ranger) to support
the topic-range queries. The thesis presents the design, implementation of several versions of Topic Ranger for trade-offs between execution time and memory space. It also
documents the experiments and comparisons of the execution time and the quality of the
resulting approximate topic models against that of the original FSS scheme. |
---|