A study on real-time low-quality content detection on Twitter from the users’ perspective

Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from th...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chen, Weiling, Yeo, Chai Kiat, Lau, Chiew Tong, Lee, Bu Sung
Other Authors:	Suleman, Hussein
Format:	Article
Language:	English
Published:	2017
Subjects:	Adult Adolescent
Online Access:	https://hdl.handle.net/10356/86664 http://hdl.handle.net/10220/44184
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-86664
record_format	dspace
spelling	sg-ntu-dr.10356-866642020-03-07T11:48:58Z A study on real-time low-quality content detection on Twitter from the users’ perspective Chen, Weiling Yeo, Chai Kiat Lau, Chiew Tong Lee, Bu Sung Suleman, Hussein School of Computer Science and Engineering Adult Adolescent Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from the users’ perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users’ opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content. Published version 2017-12-21T04:55:59Z 2019-12-06T16:26:54Z 2017-12-21T04:55:59Z 2019-12-06T16:26:54Z 2017 Journal Article Chen, W., Yeo, C. K., Lau, C. T., & Lee, B. S. (2017). A study on real-time low-quality content detection on Twitter from the users’ perspective. PLOS ONE, 12(8), e0182487-. https://hdl.handle.net/10356/86664 http://hdl.handle.net/10220/44184 10.1371/journal.pone.0182487 en PLoS ONE © 2017 Chen et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 22 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Adult Adolescent
spellingShingle	Adult Adolescent Chen, Weiling Yeo, Chai Kiat Lau, Chiew Tong Lee, Bu Sung A study on real-time low-quality content detection on Twitter from the users’ perspective
description	Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from the users’ perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users’ opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content.
author2	Suleman, Hussein
author_facet	Suleman, Hussein Chen, Weiling Yeo, Chai Kiat Lau, Chiew Tong Lee, Bu Sung
format	Article
author	Chen, Weiling Yeo, Chai Kiat Lau, Chiew Tong Lee, Bu Sung
author_sort	Chen, Weiling
title	A study on real-time low-quality content detection on Twitter from the users’ perspective
title_short	A study on real-time low-quality content detection on Twitter from the users’ perspective
title_full	A study on real-time low-quality content detection on Twitter from the users’ perspective
title_fullStr	A study on real-time low-quality content detection on Twitter from the users’ perspective
title_full_unstemmed	A study on real-time low-quality content detection on Twitter from the users’ perspective
title_sort	study on real-time low-quality content detection on twitter from the users’ perspective
publishDate	2017
url	https://hdl.handle.net/10356/86664 http://hdl.handle.net/10220/44184
_version_	1681049505990967296

A study on real-time low-quality content detection on Twitter from the users’ perspective

Similar Items