Twitter data processing using hadoop

Twitter has achieved very fast growth rate since first time it is established. It is an useful tool for online users to share their information, thought, interests and activities to their friends. As of 2010, there are 200 million active users on Twitter and Twitter has become a rich source of infor...

Full description

Saved in:
Bibliographic Details
Main Author: Khuc, Anh Tuan.
Other Authors: Lee Bu Sung
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/44678
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Twitter has achieved very fast growth rate since first time it is established. It is an useful tool for online users to share their information, thought, interests and activities to their friends. As of 2010, there are 200 million active users on Twitter and Twitter has become a rich source of information which reflects truthfully what are happening and even subjective opinions about these events. The data on Twitter is valuable for researchers, market analysts as well as companies. On the other hand, it is not easy to process and filter out useful information from a huge amount of available data on Twitter.The objective of this project is to develop a fast and scalable application to collect, store and process data on Twitter. The application is written mainly in Java, using Hadoop’s Map-Reduce software framework, which enables itself to run on a cluster of hundreds computers. By the end of the project, the application is able to find Singapore-based users on Twitter, collect their tweets, analyze them and produce some statistics in form of web-pages. Later in this paper, some possibilities of improvement are also discussed.