Twitter popularity prediction based on text mining
Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of poss...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/68958 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-68958 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-689582023-07-04T15:41:46Z Twitter popularity prediction based on text mining Weng, Quanchi Mao Kezhi School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of possible popular Twitters can render and recommend important tweets to users timely. Our work in this thesis tries to predict the popularity of tweet based on its content: text information automatically. Here, the popularity of tweet is quantified by their count number of favorites and retweets.Our system consists of two parts. The first part is text representation learning. A good representation of text data is vital to achieve a robust and optimal performance. Here, we investigated two methods to learn text representation. One is based on traditional Bag-of-words. The other is based on a recent popular technique: word embeddings. The second part is about classification algorithms. Three classical classifiers: SVM, Naive Bayes and logistical regression are compared.Extensive experiments over 3000 tweets from The Cable News Network(CNN) official account are conducted. The task has been defined as a classification problem, in which tweets with high numbers of favorites and retweets are labeled as popular ones and the tweets with low numbers of favorites and retweets are labeled as unpopular ones. It has been proven that it is possible to detect the popularity of tweets based on their content. Especially, BoW and SVM achieves the best performance. Master of Science (Computer Control and Automation) 2016-08-17T03:02:32Z 2016-08-17T03:02:32Z 2016 Thesis http://hdl.handle.net/10356/68958 en 72 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Weng, Quanchi Twitter popularity prediction based on text mining |
description |
Twitter as one of the most popular social media on the internet is generating a great amount of text data everyday. Due to huge amount data existed in Twitter, it may be difficult and even impossible for users to get access to useful and meaningful information. Therefore, automatic detection of possible popular Twitters can render and recommend important tweets to users timely. Our work in this thesis tries to predict the popularity of tweet based on its content: text information automatically. Here, the popularity of tweet is quantified by their count number of favorites and retweets.Our system consists of two parts. The first part is text representation learning. A good representation of text data is vital to achieve a robust and optimal performance. Here, we investigated two methods to learn text representation. One is based on traditional Bag-of-words. The other is based on a recent popular technique: word embeddings. The second part is about classification algorithms. Three classical classifiers: SVM, Naive Bayes and logistical regression are compared.Extensive experiments over 3000 tweets from The Cable News Network(CNN) official account are conducted. The task has been defined as a classification problem, in which tweets with high numbers of favorites and retweets are labeled as popular ones and the tweets with low numbers of favorites and retweets are labeled as unpopular ones. It has been proven that it is possible to detect the popularity of tweets based on their content. Especially, BoW and SVM achieves the best performance. |
author2 |
Mao Kezhi |
author_facet |
Mao Kezhi Weng, Quanchi |
format |
Theses and Dissertations |
author |
Weng, Quanchi |
author_sort |
Weng, Quanchi |
title |
Twitter popularity prediction based on text mining |
title_short |
Twitter popularity prediction based on text mining |
title_full |
Twitter popularity prediction based on text mining |
title_fullStr |
Twitter popularity prediction based on text mining |
title_full_unstemmed |
Twitter popularity prediction based on text mining |
title_sort |
twitter popularity prediction based on text mining |
publishDate |
2016 |
url |
http://hdl.handle.net/10356/68958 |
_version_ |
1772827608337088512 |