Twevent : segment-based event detection from tweets
Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2013
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/97953 http://hdl.handle.net/10220/12305 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-97953 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-979532020-05-28T07:41:34Z Twevent : segment-based event detection from tweets Li, Chenliang Sun, Aixin Datta, Anwitaman School of Computer Engineering International conference on Information and knowledge management (21st : 2012 : Maui, USA) DRNTU::Engineering::Computer science and engineering Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging task. Most existing techniques proposed for well written documents (e.g. news articles) cannot be directly adopted. In this paper, we propose a segment-based event detection system for tweets, called Twevent. Twevent first detects bursty tweet segments as event segments and then clusters the event segments into events considering both their frequency distribution and content similarity. More specifically, each tweet is split into non-overlapping segments (i.e. phrases possibly refer to named entities or semantically meaningful information units). The bursty segments are identified within a fixed time window based on their frequency patterns, and each bursty segment is described by the set of tweets containing the segment published within that time window. The similarity between a pair of bursty segments is computed using their associated tweets. After clustering bursty segments into candidate events, Wikipedia is exploited to identify the realistic events and to derive the most newsworthy segments to describe the identified events. We evaluate Twevent and compare it with the state-of-the-art method using 4.3 million tweets published by Singapore-based users in June 2010. In our experiments, Twevent outperforms the state-of-the-art method by a large margin in terms of both precision and recall. More importantly, the events detected by Twevent can be easily interpreted with little background knowledge because of the newsworthy segments. We also show that Twevent is efficient and scalable, leading to a desirable solution for event detection from tweets. 2013-07-25T08:19:36Z 2019-12-06T19:48:43Z 2013-07-25T08:19:36Z 2019-12-06T19:48:43Z 2012 2012 Conference Paper Li, C., Sun, A., & Datta, A. (2012). Twevent: Segment-based event detection from tweets. Proceedings of the 21st ACM international conference on Information and knowledge management. https://hdl.handle.net/10356/97953 http://hdl.handle.net/10220/12305 10.1145/2396761.2396785 en © 2012 ACM. |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Li, Chenliang Sun, Aixin Datta, Anwitaman Twevent : segment-based event detection from tweets |
description |
Event detection from tweets is an important task to understand the current events/topics attracting a large number of common users. However, the unique characteristics of tweets (e.g. short and noisy content, diverse and fast changing topics, and large data volume) make event detection a challenging task. Most existing techniques proposed for well written documents (e.g. news articles) cannot be directly adopted. In this paper, we propose a segment-based event detection system for tweets, called Twevent. Twevent first detects bursty tweet segments as event segments and then clusters the event segments into events considering both their frequency distribution and content similarity. More specifically, each tweet is split into non-overlapping segments (i.e. phrases possibly refer to named entities or semantically meaningful information units). The bursty segments are identified within a fixed time window based on their frequency patterns, and each bursty segment is described by the set of tweets containing the segment published within that time window. The similarity between a pair of bursty segments is computed using their associated tweets. After clustering bursty segments into candidate events, Wikipedia is exploited to identify the realistic events and to derive the most newsworthy segments to describe the identified events. We evaluate Twevent and compare it with the state-of-the-art method using 4.3 million tweets published by Singapore-based users in June 2010. In our experiments, Twevent outperforms the state-of-the-art method by a large margin in terms of both precision and recall. More importantly, the events detected by Twevent can be easily interpreted with little background knowledge because of the newsworthy segments. We also show that Twevent is efficient and scalable, leading to a desirable solution for event detection from tweets. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Li, Chenliang Sun, Aixin Datta, Anwitaman |
format |
Conference or Workshop Item |
author |
Li, Chenliang Sun, Aixin Datta, Anwitaman |
author_sort |
Li, Chenliang |
title |
Twevent : segment-based event detection from tweets |
title_short |
Twevent : segment-based event detection from tweets |
title_full |
Twevent : segment-based event detection from tweets |
title_fullStr |
Twevent : segment-based event detection from tweets |
title_full_unstemmed |
Twevent : segment-based event detection from tweets |
title_sort |
twevent : segment-based event detection from tweets |
publishDate |
2013 |
url |
https://hdl.handle.net/10356/97953 http://hdl.handle.net/10220/12305 |
_version_ |
1681056711412023296 |