Exploiting user and venue characteristics for fine-grained tweet geolocation
Which venue is a tweet posted from? We call this a fine-grained geolocation problem. Given an observed tweet, the task is to infer its discrete posting venue, e.g., a specific restaurant. This recovers the venue context and differs from prior work, which geolocats tweets to location coordinates or c...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2018
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/4077 https://ink.library.smu.edu.sg/context/sis_research/article/5080/viewcontent/Exploit_Fine_Grained_Tweet_Geolocation_2017_afv.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Which venue is a tweet posted from? We call this a fine-grained geolocation problem. Given an observed tweet, the task is to infer its discrete posting venue, e.g., a specific restaurant. This recovers the venue context and differs from prior work, which geolocats tweets to location coordinates or cities/neighborhoods. First, we conduct empirical analysis to uncover venue and user characteristics for improving geolocation. For venues, we observe spatial homophily, in which venues near each other have more similar tweet content (i.e., text representations) compared to venues further apart. For users, we observe that they are spatially focused and more likely to visit venues near their previous visits. We also find that a substantial proportion of users post one or more geocoded tweet(s), thus providing their location history data. We then propose geolocation models that exploit spatial homophily and spatial focus characteristics plus posting time information. Our models rank candidate venues of test tweets such that the actual posting venue is ranked high. To better tune model parameters, we introduce a learning-to-rank framework. Our best model significantly outperforms state-of-the-art baselines. Furthermore, we show that tweets without any location-indicative words can be geolocated meaningfully as well. |
---|