Effective and efficient topic mining and exploration from geo-textual data

With the prevalence of online social media (e.g, Facebook, Twitter), location-based services (e.g., Foursquare, Yelp, Flickr), and GPS-enabled devices, a huge number of documents with spatial information are being generated. Such documents are associated with either points of interest (e.g., restaur...

Full description

Saved in:
Bibliographic Details
Main Author: Zhao, Kaiqi
Other Authors: Cong Gao
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73632
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-73632
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Data
spellingShingle DRNTU::Engineering::Computer science and engineering::Data
Zhao, Kaiqi
Effective and efficient topic mining and exploration from geo-textual data
description With the prevalence of online social media (e.g, Facebook, Twitter), location-based services (e.g., Foursquare, Yelp, Flickr), and GPS-enabled devices, a huge number of documents with spatial information are being generated. Such documents are associated with either points of interest (e.g., restaurants) or latitude-longitude coordinates. We call these documents geo-textual documents. Geo-textual documents often contain information that indicates public/individual views and interests. It is of great interest to mine and explore topics from geo-textual documents to help various practical tasks, e.g., business analytics, point-of-interest (POI) recommendation, user recommendation, topic exploration, etc. There are two types of studies on mining topics from geo-textual data — (1) discovering topics of individuals from POI-associated posts (e.g., check-ins); and (2) mining and exploring topics of regions from geo-tagged microblogs. However, both types of studies have several limitations. Firstly, the topics of individuals that are mined from geo-textual data are successfully applied to POI recommendation, location prediction, etc. However, most of the existing methods mine topics from check-in datasets from Foursquare. Because each check-in often consists of limited textual information, and most of the users only shared few check-ins, it is difficult to discover meaningful topics of individuals from the check-in data. Moreover, the existing methods cannot capture topical aspects, e.g., the “environment” of a restaurant, thus failing to tell users why a POI is recommended to the user. Worse still, the existing methods are frequency-based (the more a topic is mentioned, the more likely a user prefers the topic), while ignoring the user’s sentiment. A user may hold negative opinions on some topics even though he/she mentions them many times. Secondly, the existing studies on learning topics of regions only allow users to explore the topics in predefined regions and time spans. A user may want to query topics within a specified region and time span. For example, a social scientist may want to find out breaking events by submitting regions and time spans in an exploratory manner. Some studies propose to learn geographical topic models to uncover latent regions and geographical topics. However, training these models is time consuming. It often takes months to train a model of moderate size (e.g., thousands of topics and thousands of regions) on millions of documents. However, there exists no distributed solution for training geographical topic models. To overcome the limitations in mining topics of individuals, we address two research challenges. First, we propose an approach to associating POIs with geo-tagged microblogs to compose a complementary “check-in” data source for topic mining of individuals. Second, we propose a unified model for learning topical aspects and regions of individuals with consideration of sentiment. The proposed model is able to improve the effectiveness of many downstream applications, e.g., POI recommendation, user recommendation, aspect satisfaction analysis, etc. To overcome the limitations in mining topics of regions, we consider two research problems. First, we develop a framework for exploring topics within a user specified region and time span. The framework can return topics fall in the spatio-temporal query to a user within seconds. Second, to allow efficient training of geographical topic models, we propose a distributed solution that supports learning large geographical topic models with millions of parameters from tens of gigabytes of geo-textual documents within 20 hours on a small cluster of 20 machines.
author2 Cong Gao
author_facet Cong Gao
Zhao, Kaiqi
format Theses and Dissertations
author Zhao, Kaiqi
author_sort Zhao, Kaiqi
title Effective and efficient topic mining and exploration from geo-textual data
title_short Effective and efficient topic mining and exploration from geo-textual data
title_full Effective and efficient topic mining and exploration from geo-textual data
title_fullStr Effective and efficient topic mining and exploration from geo-textual data
title_full_unstemmed Effective and efficient topic mining and exploration from geo-textual data
title_sort effective and efficient topic mining and exploration from geo-textual data
publishDate 2018
url http://hdl.handle.net/10356/73632
_version_ 1759857708431835136
spelling sg-ntu-dr.10356-736322023-03-04T00:51:37Z Effective and efficient topic mining and exploration from geo-textual data Zhao, Kaiqi Cong Gao School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Data With the prevalence of online social media (e.g, Facebook, Twitter), location-based services (e.g., Foursquare, Yelp, Flickr), and GPS-enabled devices, a huge number of documents with spatial information are being generated. Such documents are associated with either points of interest (e.g., restaurants) or latitude-longitude coordinates. We call these documents geo-textual documents. Geo-textual documents often contain information that indicates public/individual views and interests. It is of great interest to mine and explore topics from geo-textual documents to help various practical tasks, e.g., business analytics, point-of-interest (POI) recommendation, user recommendation, topic exploration, etc. There are two types of studies on mining topics from geo-textual data — (1) discovering topics of individuals from POI-associated posts (e.g., check-ins); and (2) mining and exploring topics of regions from geo-tagged microblogs. However, both types of studies have several limitations. Firstly, the topics of individuals that are mined from geo-textual data are successfully applied to POI recommendation, location prediction, etc. However, most of the existing methods mine topics from check-in datasets from Foursquare. Because each check-in often consists of limited textual information, and most of the users only shared few check-ins, it is difficult to discover meaningful topics of individuals from the check-in data. Moreover, the existing methods cannot capture topical aspects, e.g., the “environment” of a restaurant, thus failing to tell users why a POI is recommended to the user. Worse still, the existing methods are frequency-based (the more a topic is mentioned, the more likely a user prefers the topic), while ignoring the user’s sentiment. A user may hold negative opinions on some topics even though he/she mentions them many times. Secondly, the existing studies on learning topics of regions only allow users to explore the topics in predefined regions and time spans. A user may want to query topics within a specified region and time span. For example, a social scientist may want to find out breaking events by submitting regions and time spans in an exploratory manner. Some studies propose to learn geographical topic models to uncover latent regions and geographical topics. However, training these models is time consuming. It often takes months to train a model of moderate size (e.g., thousands of topics and thousands of regions) on millions of documents. However, there exists no distributed solution for training geographical topic models. To overcome the limitations in mining topics of individuals, we address two research challenges. First, we propose an approach to associating POIs with geo-tagged microblogs to compose a complementary “check-in” data source for topic mining of individuals. Second, we propose a unified model for learning topical aspects and regions of individuals with consideration of sentiment. The proposed model is able to improve the effectiveness of many downstream applications, e.g., POI recommendation, user recommendation, aspect satisfaction analysis, etc. To overcome the limitations in mining topics of regions, we consider two research problems. First, we develop a framework for exploring topics within a user specified region and time span. The framework can return topics fall in the spatio-temporal query to a user within seconds. Second, to allow efficient training of geographical topic models, we propose a distributed solution that supports learning large geographical topic models with millions of parameters from tens of gigabytes of geo-textual documents within 20 hours on a small cluster of 20 machines. Doctor of Philosophy (SCE) 2018-04-02T07:10:09Z 2018-04-02T07:10:09Z 2018 Thesis Zhao, K. (2018). Effective and efficient topic mining and exploration from geo-textual data. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/73632 10.32657/10356/73632 en 154 p. application/pdf