Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/106850 http://hdl.handle.net/10220/49683 https://doi.org/10.32657/10220/49683 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-106850 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1068502019-12-06T22:19:42Z Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents Lu, You Cong Gao School of Computer Science and Engineering Engineering::Computer science and engineering::Data With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model computational problem called topic-range queries, where the topic model of interest is restricted to the data records that fall within a dynamically specified geographic region and time period. To achieve this purpose, one naive approach is to directly apply a range query to retrieve the data items falling within the specified spatio-temporal range, then derive the topic model from the retrieved data by using a known algorithm such as LDA (Latent Dirichlet Allocation). When dealing with large volume of data, however, the two-step naive approach could each incur substantial amount of time. Novel algorithms for expediting the topic-range queries have been designed, including the fast topic combining algorithm FSS (Fast Set Sampling) which indexes the dataset with a tree, and pre-compute the topic model of the subset of data associated with each node of the tree. To answer a topic-range query, the tree nodes covered by the range query are identified, and the pre-computed topic models associated with these tree nodes are merged to produce an approximate result. Compared to the nave approach, this approximation of topic model substantially can reduce runtime. In the original design of the FSS algorithm, Cube trees are used as the indexing structure to support spatio-temporal range queries. In the literature, however, Range Trees offer a better worst-case query time guarantee for a range query. This master thesis thus considers a new combination of Range Trees and FSS (called Topic Ranger) to support the topic-range queries. The thesis presents the design, implementation of several versions of Topic Ranger for trade-offs between execution time and memory space. It also documents the experiments and comparisons of the execution time and the quality of the resulting approximate topic models against that of the original FSS scheme. Master of Engineering 2019-08-20T00:18:42Z 2019-12-06T22:19:42Z 2019-08-20T00:18:42Z 2019-12-06T22:19:42Z 2019 Thesis Lu, Y. (2019). Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/106850 http://hdl.handle.net/10220/49683 https://doi.org/10.32657/10220/49683 en 169 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Data |
spellingShingle |
Engineering::Computer science and engineering::Data Lu, You Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents |
description |
With the wide-spread usage of social media such as Facebook and Twitter, large amount
of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis
considers a specific type of topic model computational problem called topic-range queries,
where the topic model of interest is restricted to the data records that fall within a dynamically specified geographic region and time period. To achieve this purpose, one naive
approach is to directly apply a range query to retrieve the data items falling within the
specified spatio-temporal range, then derive the topic model from the retrieved data by
using a known algorithm such as LDA (Latent Dirichlet Allocation). When dealing with
large volume of data, however, the two-step naive approach could each incur substantial amount of time. Novel algorithms for expediting the topic-range queries have been
designed, including the fast topic combining algorithm FSS (Fast Set Sampling) which
indexes the dataset with a tree, and pre-compute the topic model of the subset of data
associated with each node of the tree. To answer a topic-range query, the tree nodes
covered by the range query are identified, and the pre-computed topic models associated
with these tree nodes are merged to produce an approximate result. Compared to the
nave approach, this approximation of topic model substantially can reduce runtime. In
the original design of the FSS algorithm, Cube trees are used as the indexing structure
to support spatio-temporal range queries. In the literature, however, Range Trees offer
a better worst-case query time guarantee for a range query. This master thesis thus
considers a new combination of Range Trees and FSS (called Topic Ranger) to support
the topic-range queries. The thesis presents the design, implementation of several versions of Topic Ranger for trade-offs between execution time and memory space. It also
documents the experiments and comparisons of the execution time and the quality of the
resulting approximate topic models against that of the original FSS scheme. |
author2 |
Cong Gao |
author_facet |
Cong Gao Lu, You |
format |
Theses and Dissertations |
author |
Lu, You |
author_sort |
Lu, You |
title |
Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents |
title_short |
Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents |
title_full |
Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents |
title_fullStr |
Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents |
title_full_unstemmed |
Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents |
title_sort |
topic ranger : a tool for topic exploration and analysis of spatio-temporal documents |
publishDate |
2019 |
url |
https://hdl.handle.net/10356/106850 http://hdl.handle.net/10220/49683 https://doi.org/10.32657/10220/49683 |
_version_ |
1681046095865577472 |