Twitter archive system

Twitter is a popular source of text data for mining and analysis as there is a large amount of free data available and easily accessible on Twitter. However, before data could be mined from Twitter, data has to be collected from Twitter. The purpose of this project is to design and develop a...

Full description

Saved in:

Bibliographic Details
Main Author:	Ong, Ann Aik.
Other Authors:	Sun Aixin
Format:	Final Year Project
Language:	English
Published:	2011
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
Online Access:	http://hdl.handle.net/10356/46456
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-46456
record_format	dspace
spelling	sg-ntu-dr.10356-464562023-03-03T20:32:40Z Twitter archive system Ong, Ann Aik. Sun Aixin School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Twitter is a popular source of text data for mining and analysis as there is a large amount of free data available and easily accessible on Twitter. However, before data could be mined from Twitter, data has to be collected from Twitter. The purpose of this project is to design and develop a reliable data collector which will periodically collect data from selective Twitter users using a scheduler, based on the users’ pattern of tweeting and analyzes the collected data. The entire data collection and analysis process is fully automated and it is expected to be running 24/7/365. The java desktop application is developed using NetBeans IDE 6.7.1 with MySQL Server 5.0.91 as the data storage and Twitter4J as the java library to communicate with Twitter API. The testing of the data collector is spread over a period of 3 days, from 20th to 23rd September 2011. Within these 3 days of data collection, 56,961 users were captured. 20,779 of them are Singapore users while 36,182 are non Singapore users. Apart from that, 244,192 tweets were downloaded and 144,042 of follow relationships were found. The objective of this project has been met as the data collector was found to have successfully collected a large amount of data from Twitter within the 3 days of data collection. For optimal performance of the data collector, the implementation of a multithreaded scheduler is highly recommended for future improvement. Bachelor of Engineering (Computer Science) 2011-12-06T03:52:56Z 2011-12-06T03:52:56Z 2011 2011 Final Year Project (FYP) http://hdl.handle.net/10356/46456 en Nanyang Technological University 70 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval Ong, Ann Aik. Twitter archive system
description	Twitter is a popular source of text data for mining and analysis as there is a large amount of free data available and easily accessible on Twitter. However, before data could be mined from Twitter, data has to be collected from Twitter. The purpose of this project is to design and develop a reliable data collector which will periodically collect data from selective Twitter users using a scheduler, based on the users’ pattern of tweeting and analyzes the collected data. The entire data collection and analysis process is fully automated and it is expected to be running 24/7/365. The java desktop application is developed using NetBeans IDE 6.7.1 with MySQL Server 5.0.91 as the data storage and Twitter4J as the java library to communicate with Twitter API. The testing of the data collector is spread over a period of 3 days, from 20th to 23rd September 2011. Within these 3 days of data collection, 56,961 users were captured. 20,779 of them are Singapore users while 36,182 are non Singapore users. Apart from that, 244,192 tweets were downloaded and 144,042 of follow relationships were found. The objective of this project has been met as the data collector was found to have successfully collected a large amount of data from Twitter within the 3 days of data collection. For optimal performance of the data collector, the implementation of a multithreaded scheduler is highly recommended for future improvement.
author2	Sun Aixin
author_facet	Sun Aixin Ong, Ann Aik.
format	Final Year Project
author	Ong, Ann Aik.
author_sort	Ong, Ann Aik.
title	Twitter archive system
title_short	Twitter archive system
title_full	Twitter archive system
title_fullStr	Twitter archive system
title_full_unstemmed	Twitter archive system
title_sort	twitter archive system
publishDate	2011
url	http://hdl.handle.net/10356/46456
_version_	1759854817448034304

Twitter archive system

Similar Items