Event identification and analysis on Twitter

With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant messages. Because of such wide adoption of Twitter, events like breaking news and release of popular videos can easily capture people’s attention and spread rapidly...

Full description

Saved in:
Bibliographic Details
Main Author: DIAO, Qiming
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2015
Subjects:
Online Access:https://ink.library.smu.edu.sg/etd_coll/126
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1124&context=etd_coll
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.etd_coll-1124
record_format dspace
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic event identification
event detection
twitter analysis
topic model
event summarization
bursty topic detection
Databases and Information Systems
Social Media
spellingShingle event identification
event detection
twitter analysis
topic model
event summarization
bursty topic detection
Databases and Information Systems
Social Media
DIAO, Qiming
Event identification and analysis on Twitter
description With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant messages. Because of such wide adoption of Twitter, events like breaking news and release of popular videos can easily capture people’s attention and spread rapidly on Twitter. Therefore, the popularity and importance of an event can be approximately gauged by the volume of tweets covering the event. Moreover, the relevant tweets also reflect the public’s opinions and reactions to events. It is therefore very important to identify and analyze the events on Twitter. In this dissertation, we introduce our work which aims to (1) identify events from Twitter stream, (2) analyze personal topics, events and users on Twitter, and (3) summarize the events identified from Twitter. First of all, we focus on event identification on Twitter. We observe that the textual content coupled with the temporal patterns of tweets provides important insight into the general public’s attention and interests. A sudden increase of topically similar tweets usually indicates a burst of attention in some events that has happened offline (such as a product launch or a natural disaster) or online (such as the spread of a viral video). Based on these observations, we propose two models to identify events on Twitter, which are extended from LDA and a non-parametric model. These two models share two common assumptions: (1) similar tweets emerged around the same time are more likely about some events, and (2) similar tweets published by the same user over a long term are more likely about the user’s personal background and interests. These two assumptions help separate event-driven tweets from the large proportion of personal-interests-driven tweets. The first model needs to predefine the number of events because of the limitation of topic models. However, events emerge and die out fast along the time line, and the number can be countable infinite. Our non-parametric model overcomes this challenge. In the first task described above, we aim to identify events underlying the Twitter stream, and we do not consider the relation between events and users’ personal interest topics. However, the concept of events and users’ personal interest topics are orthogonal in that many events fall under certain topics. For example, concerts fall under the topic about music. Furthermore, being social media, Twitter users play important roles in forming topics and events on Twitter. Each user has her own topic interests, which influence the content of her tweets. Whether a user publishes a tweet related to an event also largely depends on whether her topic interests match the nature of the event. Modeling the interplay between topics, events and users can deepen our understanding of Twitter content and potentially aid many predication and recommendation tasks. For the second task, we aim to construct a unified model of topics, events and users on Twitter. The unified model is a combination of a topic model, a dynamic non-parametric model and matrix factorization. The topic model part is to learn users’ personal interest topics. The dynamic non-parametric model is to identify events from the tweets stream, and finally matrix factorization is to model the interaction between topics and events. Finally, we aim to summarize the events identified on Twitter. In the previous two tasks, we utilize topic models and a dynamic non-parametric models to identify events from tweets stream. For both methods, events are learnt as clusters of tweets featured by multinomial word distributions. Therefore, users need to either read the clusters of tweets or the word distribution to interpret the events. However, the former is time-consuming and the latter cannot accurately represent the events. In this case, we propose a novel graph-based summarization method that generates concise abstractive summaries for the events. Overall, this dissertation presents our work on event identification first. Then we further analyze events, users and personal interest topics on Twitter, which can help better understand users’ tweeting behavior on events. Finally, we propose a summarization method to generate abstractive summaries for the events on Twitter.
format text
author DIAO, Qiming
author_facet DIAO, Qiming
author_sort DIAO, Qiming
title Event identification and analysis on Twitter
title_short Event identification and analysis on Twitter
title_full Event identification and analysis on Twitter
title_fullStr Event identification and analysis on Twitter
title_full_unstemmed Event identification and analysis on Twitter
title_sort event identification and analysis on twitter
publisher Institutional Knowledge at Singapore Management University
publishDate 2015
url https://ink.library.smu.edu.sg/etd_coll/126
https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1124&context=etd_coll
_version_ 1712300876619579392
spelling sg-smu-ink.etd_coll-11242017-04-07T05:52:48Z Event identification and analysis on Twitter DIAO, Qiming With the rapid growth of social media, Twitter has become one of the most widely adopted platforms for people to post short and instant messages. Because of such wide adoption of Twitter, events like breaking news and release of popular videos can easily capture people’s attention and spread rapidly on Twitter. Therefore, the popularity and importance of an event can be approximately gauged by the volume of tweets covering the event. Moreover, the relevant tweets also reflect the public’s opinions and reactions to events. It is therefore very important to identify and analyze the events on Twitter. In this dissertation, we introduce our work which aims to (1) identify events from Twitter stream, (2) analyze personal topics, events and users on Twitter, and (3) summarize the events identified from Twitter. First of all, we focus on event identification on Twitter. We observe that the textual content coupled with the temporal patterns of tweets provides important insight into the general public’s attention and interests. A sudden increase of topically similar tweets usually indicates a burst of attention in some events that has happened offline (such as a product launch or a natural disaster) or online (such as the spread of a viral video). Based on these observations, we propose two models to identify events on Twitter, which are extended from LDA and a non-parametric model. These two models share two common assumptions: (1) similar tweets emerged around the same time are more likely about some events, and (2) similar tweets published by the same user over a long term are more likely about the user’s personal background and interests. These two assumptions help separate event-driven tweets from the large proportion of personal-interests-driven tweets. The first model needs to predefine the number of events because of the limitation of topic models. However, events emerge and die out fast along the time line, and the number can be countable infinite. Our non-parametric model overcomes this challenge. In the first task described above, we aim to identify events underlying the Twitter stream, and we do not consider the relation between events and users’ personal interest topics. However, the concept of events and users’ personal interest topics are orthogonal in that many events fall under certain topics. For example, concerts fall under the topic about music. Furthermore, being social media, Twitter users play important roles in forming topics and events on Twitter. Each user has her own topic interests, which influence the content of her tweets. Whether a user publishes a tweet related to an event also largely depends on whether her topic interests match the nature of the event. Modeling the interplay between topics, events and users can deepen our understanding of Twitter content and potentially aid many predication and recommendation tasks. For the second task, we aim to construct a unified model of topics, events and users on Twitter. The unified model is a combination of a topic model, a dynamic non-parametric model and matrix factorization. The topic model part is to learn users’ personal interest topics. The dynamic non-parametric model is to identify events from the tweets stream, and finally matrix factorization is to model the interaction between topics and events. Finally, we aim to summarize the events identified on Twitter. In the previous two tasks, we utilize topic models and a dynamic non-parametric models to identify events from tweets stream. For both methods, events are learnt as clusters of tweets featured by multinomial word distributions. Therefore, users need to either read the clusters of tweets or the word distribution to interpret the events. However, the former is time-consuming and the latter cannot accurately represent the events. In this case, we propose a novel graph-based summarization method that generates concise abstractive summaries for the events. Overall, this dissertation presents our work on event identification first. Then we further analyze events, users and personal interest topics on Twitter, which can help better understand users’ tweeting behavior on events. Finally, we propose a summarization method to generate abstractive summaries for the events on Twitter. 2015-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/etd_coll/126 https://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=1124&context=etd_coll http://creativecommons.org/licenses/by-nc-nd/4.0/ Dissertations and Theses Collection (Open Access) eng Institutional Knowledge at Singapore Management University event identification event detection twitter analysis topic model event summarization bursty topic detection Databases and Information Systems Social Media