Twitter popularity prediction based on text mining

Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of poss...

Full description

Saved in:

Bibliographic Details
Main Author:	Weng, Quanchi
Other Authors:	Mao Kezhi
Format:	Theses and Dissertations
Language:	English
Published:	2016
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/68958
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-68958
record_format	dspace
spelling	sg-ntu-dr.10356-689582023-07-04T15:41:46Z Twitter popularity prediction based on text mining Weng, Quanchi Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of possible popular Twitters can render and recommend important tweets to users timely. Our work in this thesis tries to predict the popularity of tweet based on its content: text information automatically. Here, the popularity of tweet is quantified by their count number of favorites and retweets.Our system consists of two parts. The first part is text representation learning. A good representation of text data is vital to achieve a robust and optimal performance. Here, we investigated two methods to learn text representation. One is based on traditional Bag-of-words. The other is based on a recent popular technique: word embeddings. The second part is about classification algorithms. Three classical classifiers: SVM, Naive Bayes and logistical regression are compared.Extensive experiments over 3000 tweets from The Cable News Network(CNN) official account are conducted. The task has been defined as a classification problem, in which tweets with high numbers of favorites and retweets are labeled as popular ones and the tweets with low numbers of favorites and retweets are labeled as unpopular ones. It has been proven that it is possible to detect the popularity of tweets based on their content. Especially, BoW and SVM achieves the best performance. Master of Science (Computer Control and Automation) 2016-08-17T03:02:32Z 2016-08-17T03:02:32Z 2016 Thesis http://hdl.handle.net/10356/68958 en 72 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Weng, Quanchi Twitter popularity prediction based on text mining
description	Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of possible popular Twitters can render and recommend important tweets to users timely. Our work in this thesis tries to predict the popularity of tweet based on its content: text information automatically. Here, the popularity of tweet is quantified by their count number of favorites and retweets.Our system consists of two parts. The first part is text representation learning. A good representation of text data is vital to achieve a robust and optimal performance. Here, we investigated two methods to learn text representation. One is based on traditional Bag-of-words. The other is based on a recent popular technique: word embeddings. The second part is about classification algorithms. Three classical classifiers: SVM, Naive Bayes and logistical regression are compared.Extensive experiments over 3000 tweets from The Cable News Network(CNN) official account are conducted. The task has been defined as a classification problem, in which tweets with high numbers of favorites and retweets are labeled as popular ones and the tweets with low numbers of favorites and retweets are labeled as unpopular ones. It has been proven that it is possible to detect the popularity of tweets based on their content. Especially, BoW and SVM achieves the best performance.
author2	Mao Kezhi
author_facet	Mao Kezhi Weng, Quanchi
format	Theses and Dissertations
author	Weng, Quanchi
author_sort	Weng, Quanchi
title	Twitter popularity prediction based on text mining
title_short	Twitter popularity prediction based on text mining
title_full	Twitter popularity prediction based on text mining
title_fullStr	Twitter popularity prediction based on text mining
title_full_unstemmed	Twitter popularity prediction based on text mining
title_sort	twitter popularity prediction based on text mining
publishDate	2016
url	http://hdl.handle.net/10356/68958
_version_	1772827608337088512

Twitter popularity prediction based on text mining

Similar Items