Aggregating and analyzing Indian political tweets

The author’s final year project is a part of the Twitter Data Analysis project which aims to gain insight into Indian politics using data from Twitter Stream and applying NLP and Data Mining Techniques to the same. For developing an analytical engine which does said things, historical as well as cur...

Full description

Saved in:
Bibliographic Details
Main Author: Chirag Ruhela
Other Authors: Anwitaman Datta
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/59920
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The author’s final year project is a part of the Twitter Data Analysis project which aims to gain insight into Indian politics using data from Twitter Stream and applying NLP and Data Mining Techniques to the same. For developing an analytical engine which does said things, historical as well as current data about Indian Politics has to be analysed by building mathematical models to uncover patterns and correlations and be able to understand political events and upheavals. The historical data to be analysed can be huge in size if accurate mathematical models need to be built. Understanding this huge data-set in its raw form is not possible due to the sheer dimensionality of the data-set. Thus dimensionality reduction and clever insightful visualizations are needed to make this data consumable for general public. As part of this project the author has designed and implemented a Sentiment Analysis Engine using Affective Norms for English Words (ANEW) framework for a Natural Language Processing Model based sentiment detection of twitter data. A topic identification module has also been implemented using tf-idf algorithm. The dimensionality reduction of the data set has been done using Scatterplot visualization of tweet sentiments alongside topic clusters. Heat Maps and Word Clouds have been used to simplify the data consumption. The affinity graph has been implemented to show diffusion networks for various topics and people. Lastly, the raw tweets are also presented in a tabular form for those interested in the raw data.