Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics
Twitter is widely regarded as a fast growing social networking medium. The massive amounts of data available on twitter encourage researchers to crawl and analyse data from it. The purpose of this project is to create a complete system which contains a twitter crawler, a multidimensional data...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/59056 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-59056 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-590562023-03-03T20:38:06Z Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics Lai, Alvin Weijian Anwitaman Datta School of Computer Engineering Parallel and Distributed Computing Centre DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Twitter is widely regarded as a fast growing social networking medium. The massive amounts of data available on twitter encourage researchers to crawl and analyse data from it. The purpose of this project is to create a complete system which contains a twitter crawler, a multidimensional database, a data digester and an analytics engine. The crawler is used for the crawling of tweets, the multidimensional database for storing the crawled data and a data digester for formatting the data crawled before storing it into the database while the analytics engine used to perform analysis on the data. Due to the magnitude of this project, multiple students are allocated to it. The author of this report was involved in the schema design of the twitter database and came up with a list of the types of content of the tweets data to be crawled by the twitter crawler. The data digester and part-of-speech tagging for the analytics engine were developed by the author of this report as well. The testing of the system is spread over a period of 4 days, from 14th to 17th March 2014. Within these 4 days, a total of 8,021,933 tweets were collected. The objective of this project has been met as the data digester was found to have a large amount of data successfully. The collected tweets were also successfully formatted by the data digester and inserted into the database. The testing of the analytical engine shows that the Stanford POS tagger was able to identify the name of politician, political party or a state in India with a recall value of 1 and a precision value of 0.47. In order to crawl more relevant tweets relating to Indian politics, improvement should be made to how the twitter crawler crawls for tweets. Instead of keywords, the crawler should start crawling tweets from a list of twitter users which are very active in Indian politics. Currently the visualizer, operational database and twitter crawler reside on the same server. To improve performance of the overall system, two servers should be used; one for hosting the operational database and one for data warehousing. Bachelor of Engineering (Computer Engineering) 2014-04-22T02:26:34Z 2014-04-22T02:26:34Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/59056 en Nanyang Technological University 44 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Lai, Alvin Weijian Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics |
description |
Twitter is widely regarded as a fast growing social networking medium. The massive amounts of data available on twitter encourage researchers to crawl and analyse data from it.
The purpose of this project is to create a complete system which contains a twitter crawler, a multidimensional database, a data digester and an analytics engine. The crawler is used for the crawling of tweets, the multidimensional database for storing the crawled data and a data digester for formatting the data crawled before storing it into the database while the analytics engine used to perform analysis on the data.
Due to the magnitude of this project, multiple students are allocated to it. The author of this report was involved in the schema design of the twitter database and came up with a list of the types of content of the tweets data to be crawled by the twitter crawler. The data digester and part-of-speech tagging for the analytics engine were developed by the author of this report as well.
The testing of the system is spread over a period of 4 days, from 14th to 17th March 2014. Within these 4 days, a total of 8,021,933 tweets were collected. The objective of this project has been met as the data digester was found to have a large amount of data successfully. The collected tweets were also successfully formatted by the data digester and inserted into the database.
The testing of the analytical engine shows that the Stanford POS tagger was able to identify the name of politician, political party or a state in India with a recall value of 1 and a precision value of 0.47.
In order to crawl more relevant tweets relating to Indian politics, improvement should be made to how the twitter crawler crawls for tweets. Instead of keywords, the crawler should start crawling tweets from a list of twitter users which are very active in Indian politics.
Currently the visualizer, operational database and twitter crawler reside on the same server. To improve performance of the overall system, two servers should be used; one for hosting the operational database and one for data warehousing. |
author2 |
Anwitaman Datta |
author_facet |
Anwitaman Datta Lai, Alvin Weijian |
format |
Final Year Project |
author |
Lai, Alvin Weijian |
author_sort |
Lai, Alvin Weijian |
title |
Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics |
title_short |
Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics |
title_full |
Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics |
title_fullStr |
Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics |
title_full_unstemmed |
Analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with Indian politics |
title_sort |
analysing the simultaneous effects of personalities, demographics, competition and collaboration : a case study with indian politics |
publishDate |
2014 |
url |
http://hdl.handle.net/10356/59056 |
_version_ |
1759857028049666048 |