NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model

Twitter is one of the most widely used social media platforms today. It enables users to share and view short 140-character messages called 'tweets'. About 284 million active users generate close to 500 million tweets per day. Such rapid generation of user generated content in large magnit...

Full description

Saved in:

Bibliographic Details
Main Authors:	SHARMA, Abishek, TIAN, Yuan, David LO
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2015
Subjects:	Computer Sciences Databases and Information Systems Social Media
Online Access:	https://ink.library.smu.edu.sg/sis_research/3194 https://ink.library.smu.edu.sg/context/sis_research/article/4195/viewcontent/Nirmal_SANER_2015_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4195
record_format	dspace
spelling	sg-smu-ink.sis_research-41952020-02-14T03:19:34Z NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model SHARMA, Abishek TIAN, Yuan David LO, Twitter is one of the most widely used social media platforms today. It enables users to share and view short 140-character messages called 'tweets'. About 284 million active users generate close to 500 million tweets per day. Such rapid generation of user generated content in large magnitudes results in the problem of information overload. Users who are interested in information related to a particular domain have limited means to filter out irrelevant tweets and tend to get lost in the huge amount of data they encounter. A recent study by Singer et al. found that software developers use Twitter to stay aware of industry trends, to learn from others, and to network with other developers. However, Singer et al. also reported that developers often find Twitter streams to contain too much noise which is a barrier to the adoption of Twitter. In this paper, to help developers cope with noise, we propose a novel approach named NIRMAL, which automatically identifies software relevant tweets from a collection or stream of tweets. Our approach is based on language modeling which learns a statistical model based on a training corpus (i.e., set of documents). We make use of a subset of posts from StackOverflow, a programming question and answer site, as a training corpus to learn a language model. A corpus of tweets was then used to test the effectiveness of the trained language model. The tweets were sorted based on the rank the model assigned to each of the individual tweets. The top 200 tweets were then manually analyzed to verify whether they are software related or not, and then an accuracy score was calculated. The results show that decent accuracy scores can be achieved by various variants of NIRMAL, which indicates that NIRMAL can effectively identify software related tweets from a huge corpus of tweets. 2015-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3194 info:doi/10.1109/SANER.2015.7081855 https://ink.library.smu.edu.sg/context/sis_research/article/4195/viewcontent/Nirmal_SANER_2015_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Computer Sciences Databases and Information Systems Social Media
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Computer Sciences Databases and Information Systems Social Media
spellingShingle	Computer Sciences Databases and Information Systems Social Media SHARMA, Abishek TIAN, Yuan David LO, NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model
description	Twitter is one of the most widely used social media platforms today. It enables users to share and view short 140-character messages called 'tweets'. About 284 million active users generate close to 500 million tweets per day. Such rapid generation of user generated content in large magnitudes results in the problem of information overload. Users who are interested in information related to a particular domain have limited means to filter out irrelevant tweets and tend to get lost in the huge amount of data they encounter. A recent study by Singer et al. found that software developers use Twitter to stay aware of industry trends, to learn from others, and to network with other developers. However, Singer et al. also reported that developers often find Twitter streams to contain too much noise which is a barrier to the adoption of Twitter. In this paper, to help developers cope with noise, we propose a novel approach named NIRMAL, which automatically identifies software relevant tweets from a collection or stream of tweets. Our approach is based on language modeling which learns a statistical model based on a training corpus (i.e., set of documents). We make use of a subset of posts from StackOverflow, a programming question and answer site, as a training corpus to learn a language model. A corpus of tweets was then used to test the effectiveness of the trained language model. The tweets were sorted based on the rank the model assigned to each of the individual tweets. The top 200 tweets were then manually analyzed to verify whether they are software related or not, and then an accuracy score was calculated. The results show that decent accuracy scores can be achieved by various variants of NIRMAL, which indicates that NIRMAL can effectively identify software related tweets from a huge corpus of tweets.
format	text
author	SHARMA, Abishek TIAN, Yuan David LO,
author_facet	SHARMA, Abishek TIAN, Yuan David LO,
author_sort	SHARMA, Abishek
title	NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model
title_short	NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model
title_full	NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model
title_fullStr	NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model
title_full_unstemmed	NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model
title_sort	nirmal: automatic identification of software relevant tweets leveraging language model
publisher	Institutional Knowledge at Singapore Management University
publishDate	2015
url	https://ink.library.smu.edu.sg/sis_research/3194 https://ink.library.smu.edu.sg/context/sis_research/article/4195/viewcontent/Nirmal_SANER_2015_av.pdf
_version_	1770572975022538752

NIRMAL: Automatic Identification of Software Relevant Tweets Leveraging Language Model

Similar Items