Twitter data processing using hadoop

Twitter has achieved very fast growth rate since first time it is established. It is an useful tool for online users to share their information, thought, interests and activities to their friends. As of 2010, there are 200 million active users on Twitter and Twitter has become a rich source of infor...

Full description

Saved in:
Bibliographic Details
Main Author: Khuc, Anh Tuan.
Other Authors: Lee Bu Sung
Format: Final Year Project
Language:English
Published: 2011
Subjects:
Online Access:http://hdl.handle.net/10356/44678
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-44678
record_format dspace
spelling sg-ntu-dr.10356-446782023-03-03T20:42:11Z Twitter data processing using hadoop Khuc, Anh Tuan. Lee Bu Sung School of Computer Engineering DRNTU::Engineering::Computer science and engineering Twitter has achieved very fast growth rate since first time it is established. It is an useful tool for online users to share their information, thought, interests and activities to their friends. As of 2010, there are 200 million active users on Twitter and Twitter has become a rich source of information which reflects truthfully what are happening and even subjective opinions about these events. The data on Twitter is valuable for researchers, market analysts as well as companies. On the other hand, it is not easy to process and filter out useful information from a huge amount of available data on Twitter.The objective of this project is to develop a fast and scalable application to collect, store and process data on Twitter. The application is written mainly in Java, using Hadoop’s Map-Reduce software framework, which enables itself to run on a cluster of hundreds computers. By the end of the project, the application is able to find Singapore-based users on Twitter, collect their tweets, analyze them and produce some statistics in form of web-pages. Later in this paper, some possibilities of improvement are also discussed. Bachelor of Engineering (Computer Science) 2011-06-03T01:55:18Z 2011-06-03T01:55:18Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/44678 en Nanyang Technological University 42 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Khuc, Anh Tuan.
Twitter data processing using hadoop
description Twitter has achieved very fast growth rate since first time it is established. It is an useful tool for online users to share their information, thought, interests and activities to their friends. As of 2010, there are 200 million active users on Twitter and Twitter has become a rich source of information which reflects truthfully what are happening and even subjective opinions about these events. The data on Twitter is valuable for researchers, market analysts as well as companies. On the other hand, it is not easy to process and filter out useful information from a huge amount of available data on Twitter.The objective of this project is to develop a fast and scalable application to collect, store and process data on Twitter. The application is written mainly in Java, using Hadoop’s Map-Reduce software framework, which enables itself to run on a cluster of hundreds computers. By the end of the project, the application is able to find Singapore-based users on Twitter, collect their tweets, analyze them and produce some statistics in form of web-pages. Later in this paper, some possibilities of improvement are also discussed.
author2 Lee Bu Sung
author_facet Lee Bu Sung
Khuc, Anh Tuan.
format Final Year Project
author Khuc, Anh Tuan.
author_sort Khuc, Anh Tuan.
title Twitter data processing using hadoop
title_short Twitter data processing using hadoop
title_full Twitter data processing using hadoop
title_fullStr Twitter data processing using hadoop
title_full_unstemmed Twitter data processing using hadoop
title_sort twitter data processing using hadoop
publishDate 2011
url http://hdl.handle.net/10356/44678
_version_ 1759854531429007360