Geo-tagged data retrieval and mining from Foursquare and Twitter

Large quantities of user-generated content (UGC) were produced every moment due to the popularity of social media. These UGC implies user daily life status. When properly analyzed, it would be beneficial to many fields. One of the valuable research areas is to identifying the Point-of-Interest (POI)...

Full description

Saved in:
Bibliographic Details
Main Author: Chen, Wei
Other Authors: Cong Gao
Format: Final Year Project
Language:English
Published: 2015
Subjects:
Online Access:http://hdl.handle.net/10356/62860
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Large quantities of user-generated content (UGC) were produced every moment due to the popularity of social media. These UGC implies user daily life status. When properly analyzed, it would be beneficial to many fields. One of the valuable research areas is to identifying the Point-of-Interest (POI) based on geo-tagged tweet on Twitter and venue information on Foursquare. This problem is rather challenging, because the location information in a tweet is not complete. Even worse, the location information can be misleading or incorrect at all. To address this problem, a model was built to retrieve information from Twitter and Foursquare and combine attributes from different sources. Then a prediction model was designed to make prediction of the POI that user visited based on his/her geo-tagged tweet on Twitter. The model is trained using both tweet text on Twitter and venue information on Foursquare. To improve the accuracy of the model on Urban POI identification, it utilizes those tweets with geo-tag (GPS) data attributes. The GPS location data will greatly improve the accuracy by reduce the possible POI to nearest possible POIs. Then the predicting model will use user tips (same as comment text) of venues (same as POIs) on Foursquare to evaluate the relativity of a tweet to the POI. The of this model is that it utilizes human comment text to evaluate human tweets. As a result, this model delivered excellent performance on both accuracy and efficiency.