TwiNER : named entity recognition in targeted twitter stream

Many private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets with user-defined selection criteria e.g. tweets publi...

Full description

Saved in:
Bibliographic Details
Main Authors: Li, Chenliang, Weng, Jianshu, He, Qi, Yao, Yuxia, Datta, Anwitaman, Sun, Aixin, Lee, Bu-Sung
Other Authors: School of Computer Engineering
Format: Conference or Workshop Item
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/97637
http://hdl.handle.net/10220/12087
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-97637
record_format dspace
spelling sg-ntu-dr.10356-976372020-05-28T07:41:34Z TwiNER : named entity recognition in targeted twitter stream Li, Chenliang Weng, Jianshu He, Qi Yao, Yuxia Datta, Anwitaman Sun, Aixin Lee, Bu-Sung School of Computer Engineering International conference on Research and development in information retrieval (35th : 2012) DRNTU::Engineering::Computer science and engineering Many private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets with user-defined selection criteria e.g. tweets published by users from a selected region, or tweets that match one or more predefined keywords. Targeted Twitter stream is then monitored to collect and understand users' opinions about the organizations. There is an emerging need for early crisis detection and response with such target stream. Such applications require a good named entity recognition (NER) system for Twitter, which is able to automatically discover emerging named entities that is potentially linked to the crisis. In this paper, we present a novel 2-step unsupervised NER system for targeted Twitter stream, called TwiNER. In the first step, it leverages on the global context obtained from Wikipedia and Web N-Gram corpus to partition tweets into valid segments (phrases) using a dynamic programming algorithm. Each such tweet segment is a candidate named entity. It is observed that the named entities in the targeted stream usually exhibit a gregarious property, due to the way the targeted stream is constructed. In the second step, TwiNER constructs a random walk model to exploit the gregarious property in the local context derived from the Twitter stream. The highly-ranked segments have a higher chance of being true named entities. We evaluated TwiNER on two sets of real-life tweets simulating two targeted streams. Evaluated using labeled ground truth, TwiNER achieves comparable performance as with conventional approaches in both streams. Various settings of TwiNER have also been examined to verify our global context + local context combo idea. 2013-07-24T02:45:36Z 2019-12-06T19:44:49Z 2013-07-24T02:45:36Z 2019-12-06T19:44:49Z 2012 2012 Conference Paper Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., et al. (2012). TwiNER: named entity recognition in targeted twitter stream. Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval - SIGIR '12. https://hdl.handle.net/10356/97637 http://hdl.handle.net/10220/12087 10.1145/2348283.2348380 en © 2012 ACM.
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Li, Chenliang
Weng, Jianshu
He, Qi
Yao, Yuxia
Datta, Anwitaman
Sun, Aixin
Lee, Bu-Sung
TwiNER : named entity recognition in targeted twitter stream
description Many private and/or public organizations have been reported to create and monitor targeted Twitter streams to collect and understand users' opinions about the organizations. Targeted Twitter stream is usually constructed by filtering tweets with user-defined selection criteria e.g. tweets published by users from a selected region, or tweets that match one or more predefined keywords. Targeted Twitter stream is then monitored to collect and understand users' opinions about the organizations. There is an emerging need for early crisis detection and response with such target stream. Such applications require a good named entity recognition (NER) system for Twitter, which is able to automatically discover emerging named entities that is potentially linked to the crisis. In this paper, we present a novel 2-step unsupervised NER system for targeted Twitter stream, called TwiNER. In the first step, it leverages on the global context obtained from Wikipedia and Web N-Gram corpus to partition tweets into valid segments (phrases) using a dynamic programming algorithm. Each such tweet segment is a candidate named entity. It is observed that the named entities in the targeted stream usually exhibit a gregarious property, due to the way the targeted stream is constructed. In the second step, TwiNER constructs a random walk model to exploit the gregarious property in the local context derived from the Twitter stream. The highly-ranked segments have a higher chance of being true named entities. We evaluated TwiNER on two sets of real-life tweets simulating two targeted streams. Evaluated using labeled ground truth, TwiNER achieves comparable performance as with conventional approaches in both streams. Various settings of TwiNER have also been examined to verify our global context + local context combo idea.
author2 School of Computer Engineering
author_facet School of Computer Engineering
Li, Chenliang
Weng, Jianshu
He, Qi
Yao, Yuxia
Datta, Anwitaman
Sun, Aixin
Lee, Bu-Sung
format Conference or Workshop Item
author Li, Chenliang
Weng, Jianshu
He, Qi
Yao, Yuxia
Datta, Anwitaman
Sun, Aixin
Lee, Bu-Sung
author_sort Li, Chenliang
title TwiNER : named entity recognition in targeted twitter stream
title_short TwiNER : named entity recognition in targeted twitter stream
title_full TwiNER : named entity recognition in targeted twitter stream
title_fullStr TwiNER : named entity recognition in targeted twitter stream
title_full_unstemmed TwiNER : named entity recognition in targeted twitter stream
title_sort twiner : named entity recognition in targeted twitter stream
publishDate 2013
url https://hdl.handle.net/10356/97637
http://hdl.handle.net/10220/12087
_version_ 1681056484495982592