Twevent : segment-based event detection from tweets

Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging...

Full description

Saved in:
Bibliographic Details
Main Authors: Li, Chenliang, Sun, Aixin, Datta, Anwitaman
Other Authors: School of Computer Engineering
Format: Conference or Workshop Item
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/97953
http://hdl.handle.net/10220/12305
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-97953
record_format dspace
spelling sg-ntu-dr.10356-979532020-05-28T07:41:34Z Twevent : segment-based event detection from tweets Li, Chenliang Sun, Aixin Datta, Anwitaman School of Computer Engineering International conference on Information and knowledge management (21st : 2012 : Maui, USA) DRNTU::Engineering::Computer science and engineering Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging task. Most existing techniques proposed for well written documents (e.g. news articles) cannot be directly adopted. In this paper, we propose a segment-based event detection system for tweets, called Twevent. Twevent first detects bursty tweet segments as event segments and then clusters the event segments into events considering both their frequency distribution and content similarity. More specifically, each tweet is split into non-overlapping segments (i.e. phrases possibly refer to named entities or semantically meaningful information units). The bursty segments are identified within a fixed time window based on their frequency patterns, and each bursty segment is described by the set of tweets containing the segment published within that time window. The similarity between a pair of bursty segments is computed using their associated tweets. After clustering bursty segments into candidate events, Wikipedia is exploited to identify the realistic events and to derive the most newsworthy segments to describe the identified events. We evaluate Twevent and compare it with the state-of-the-art method using 4.3 million tweets published by Singapore-based users in June 2010. In our experiments, Twevent outperforms the state-of-the-art method by a large margin in terms of both precision and recall. More importantly, the events detected by Twevent can be easily interpreted with little background knowledge because of the newsworthy segments. We also show that Twevent is efficient and scalable, leading to a desirable solution for event detection from tweets. 2013-07-25T08:19:36Z 2019-12-06T19:48:43Z 2013-07-25T08:19:36Z 2019-12-06T19:48:43Z 2012 2012 Conference Paper Li, C., Sun, A., & Datta, A. (2012). Twevent: Segment-based event detection from tweets. Proceedings of the 21st ACM international conference on Information and knowledge management. https://hdl.handle.net/10356/97953 http://hdl.handle.net/10220/12305 10.1145/2396761.2396785 en © 2012 ACM.
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Li, Chenliang
Sun, Aixin
Datta, Anwitaman
Twevent : segment-based event detection from tweets
description Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging task. Most existing techniques proposed for well written documents (e.g. news articles) cannot be directly adopted. In this paper, we propose a segment-based event detection system for tweets, called Twevent. Twevent first detects bursty tweet segments as event segments and then clusters the event segments into events considering both their frequency distribution and content similarity. More specifically, each tweet is split into non-overlapping segments (i.e. phrases possibly refer to named entities or semantically meaningful information units). The bursty segments are identified within a fixed time window based on their frequency patterns, and each bursty segment is described by the set of tweets containing the segment published within that time window. The similarity between a pair of bursty segments is computed using their associated tweets. After clustering bursty segments into candidate events, Wikipedia is exploited to identify the realistic events and to derive the most newsworthy segments to describe the identified events. We evaluate Twevent and compare it with the state-of-the-art method using 4.3 million tweets published by Singapore-based users in June 2010. In our experiments, Twevent outperforms the state-of-the-art method by a large margin in terms of both precision and recall. More importantly, the events detected by Twevent can be easily interpreted with little background knowledge because of the newsworthy segments. We also show that Twevent is efficient and scalable, leading to a desirable solution for event detection from tweets.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Li, Chenliang
Sun, Aixin
Datta, Anwitaman
format Conference or Workshop Item
author Li, Chenliang
Sun, Aixin
Datta, Anwitaman
author_sort Li, Chenliang
title Twevent : segment-based event detection from tweets
title_short Twevent : segment-based event detection from tweets
title_full Twevent : segment-based event detection from tweets
title_fullStr Twevent : segment-based event detection from tweets
title_full_unstemmed Twevent : segment-based event detection from tweets
title_sort twevent : segment-based event detection from tweets
publishDate 2013
url https://hdl.handle.net/10356/97953
http://hdl.handle.net/10220/12305
_version_ 1681056711412023296