Twitter data processing using hadoop
Twitter has achieved very fast growth rate since first time it is established. It is an useful tool for online users to share their information, thought, interests and activities to their friends. As of 2010, there are 200 million active users on Twitter and Twitter has become a rich source of infor...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2011
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/44678 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Twitter has achieved very fast growth rate since first time it is established. It is an useful tool for online users to share their information, thought, interests and activities to their friends. As of 2010, there are 200 million active users on Twitter and Twitter has become a rich source of information which reflects truthfully what are happening and even subjective opinions about these events. The data on Twitter is valuable for researchers, market analysts as well as companies. On the other hand, it is not easy to process and filter out useful information from a huge amount of available data on Twitter.The objective of this project is to develop a fast and scalable application to collect, store and process data on Twitter. The application is written mainly in Java, using Hadoop’s Map-Reduce software framework, which enables itself to run on a cluster of hundreds computers. By the end of the project, the application is able to find Singapore-based users on Twitter, collect their tweets, analyze them and produce some statistics in form of web-pages. Later in this paper, some possibilities of improvement are also discussed. |
---|