Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents

With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model...

Full description

Saved in:
Bibliographic Details
Main Author: Lu, You
Other Authors: Cong Gao
Format: Theses and Dissertations
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/106850
http://hdl.handle.net/10220/49683
https://doi.org/10.32657/10220/49683
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-106850
record_format dspace
spelling sg-ntu-dr.10356-1068502019-12-06T22:19:42Z Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents Lu, You Cong Gao School of Computer Science and Engineering Engineering::Computer science and engineering::Data With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model computational problem called topic-range queries, where the topic model of interest is restricted to the data records that fall within a dynamically specified geographic region and time period. To achieve this purpose, one naive approach is to directly apply a range query to retrieve the data items falling within the specified spatio-temporal range, then derive the topic model from the retrieved data by using a known algorithm such as LDA (Latent Dirichlet Allocation). When dealing with large volume of data, however, the two-step naive approach could each incur substantial amount of time. Novel algorithms for expediting the topic-range queries have been designed, including the fast topic combining algorithm FSS (Fast Set Sampling) which indexes the dataset with a tree, and pre-compute the topic model of the subset of data associated with each node of the tree. To answer a topic-range query, the tree nodes covered by the range query are identified, and the pre-computed topic models associated with these tree nodes are merged to produce an approximate result. Compared to the nave approach, this approximation of topic model substantially can reduce runtime. In the original design of the FSS algorithm, Cube trees are used as the indexing structure to support spatio-temporal range queries. In the literature, however, Range Trees offer a better worst-case query time guarantee for a range query. This master thesis thus considers a new combination of Range Trees and FSS (called Topic Ranger) to support the topic-range queries. The thesis presents the design, implementation of several versions of Topic Ranger for trade-offs between execution time and memory space. It also documents the experiments and comparisons of the execution time and the quality of the resulting approximate topic models against that of the original FSS scheme. Master of Engineering 2019-08-20T00:18:42Z 2019-12-06T22:19:42Z 2019-08-20T00:18:42Z 2019-12-06T22:19:42Z 2019 Thesis Lu, Y. (2019). Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/106850 http://hdl.handle.net/10220/49683 https://doi.org/10.32657/10220/49683 en 169 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Data
spellingShingle Engineering::Computer science and engineering::Data
Lu, You
Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
description With the wide-spread usage of social media such as Facebook and Twitter, large amount of data with both spatial and temporal information has become available. Topic modelling has been a useful tool to uncover latent information from such data. This thesis considers a specific type of topic model computational problem called topic-range queries, where the topic model of interest is restricted to the data records that fall within a dynamically specified geographic region and time period. To achieve this purpose, one naive approach is to directly apply a range query to retrieve the data items falling within the specified spatio-temporal range, then derive the topic model from the retrieved data by using a known algorithm such as LDA (Latent Dirichlet Allocation). When dealing with large volume of data, however, the two-step naive approach could each incur substantial amount of time. Novel algorithms for expediting the topic-range queries have been designed, including the fast topic combining algorithm FSS (Fast Set Sampling) which indexes the dataset with a tree, and pre-compute the topic model of the subset of data associated with each node of the tree. To answer a topic-range query, the tree nodes covered by the range query are identified, and the pre-computed topic models associated with these tree nodes are merged to produce an approximate result. Compared to the nave approach, this approximation of topic model substantially can reduce runtime. In the original design of the FSS algorithm, Cube trees are used as the indexing structure to support spatio-temporal range queries. In the literature, however, Range Trees offer a better worst-case query time guarantee for a range query. This master thesis thus considers a new combination of Range Trees and FSS (called Topic Ranger) to support the topic-range queries. The thesis presents the design, implementation of several versions of Topic Ranger for trade-offs between execution time and memory space. It also documents the experiments and comparisons of the execution time and the quality of the resulting approximate topic models against that of the original FSS scheme.
author2 Cong Gao
author_facet Cong Gao
Lu, You
format Theses and Dissertations
author Lu, You
author_sort Lu, You
title Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
title_short Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
title_full Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
title_fullStr Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
title_full_unstemmed Topic Ranger : a tool for topic exploration and analysis of spatio-temporal documents
title_sort topic ranger : a tool for topic exploration and analysis of spatio-temporal documents
publishDate 2019
url https://hdl.handle.net/10356/106850
http://hdl.handle.net/10220/49683
https://doi.org/10.32657/10220/49683
_version_ 1681046095865577472