Twitter popularity prediction based on text mining

Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of poss...

Full description

Saved in:
Bibliographic Details
Main Author: Weng, Quanchi
Other Authors: Mao Kezhi
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/68958
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-68958
record_format dspace
spelling sg-ntu-dr.10356-689582023-07-04T15:41:46Z Twitter popularity prediction based on text mining Weng, Quanchi Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of possible popular Twitters can render and recommend important tweets to users timely. Our work in this thesis tries to predict the popularity of tweet based on its content: text information automatically. Here, the popularity of tweet is quantified by their count number of favorites and retweets.Our system consists of two parts. The first part is text representation learning. A good representation of text data is vital to achieve a robust and optimal performance. Here, we investigated two methods to learn text representation. One is based on traditional Bag-of-words. The other is based on a recent popular technique: word embeddings. The second part is about classification algorithms. Three classical classifiers: SVM, Naive Bayes and logistical regression are compared.Extensive experiments over 3000 tweets from The Cable News Network(CNN) official account are conducted. The task has been defined as a classification problem, in which tweets with high numbers of favorites and retweets are labeled as popular ones and the tweets with low numbers of favorites and retweets are labeled as unpopular ones. It has been proven that it is possible to detect the popularity of tweets based on their content. Especially, BoW and SVM achieves the best performance. Master of Science (Computer Control and Automation) 2016-08-17T03:02:32Z 2016-08-17T03:02:32Z 2016 Thesis http://hdl.handle.net/10356/68958 en 72 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Weng, Quanchi
Twitter popularity prediction based on text mining
description Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of possible popular Twitters can render and recommend important tweets to users timely. Our work in this thesis tries to predict the popularity of tweet based on its content: text information automatically. Here, the popularity of tweet is quantified by their count number of favorites and retweets.Our system consists of two parts. The first part is text representation learning. A good representation of text data is vital to achieve a robust and optimal performance. Here, we investigated two methods to learn text representation. One is based on traditional Bag-of-words. The other is based on a recent popular technique: word embeddings. The second part is about classification algorithms. Three classical classifiers: SVM, Naive Bayes and logistical regression are compared.Extensive experiments over 3000 tweets from The Cable News Network(CNN) official account are conducted. The task has been defined as a classification problem, in which tweets with high numbers of favorites and retweets are labeled as popular ones and the tweets with low numbers of favorites and retweets are labeled as unpopular ones. It has been proven that it is possible to detect the popularity of tweets based on their content. Especially, BoW and SVM achieves the best performance.
author2 Mao Kezhi
author_facet Mao Kezhi
Weng, Quanchi
format Theses and Dissertations
author Weng, Quanchi
author_sort Weng, Quanchi
title Twitter popularity prediction based on text mining
title_short Twitter popularity prediction based on text mining
title_full Twitter popularity prediction based on text mining
title_fullStr Twitter popularity prediction based on text mining
title_full_unstemmed Twitter popularity prediction based on text mining
title_sort twitter popularity prediction based on text mining
publishDate 2016
url http://hdl.handle.net/10356/68958
_version_ 1772827608337088512