Parallel social network crawler system

Crawling social network data can uncover interesting phenomena for a variety of usage. However it is also generally sluggish due to the fact that it requires 3rd party services over a competitive network. These services are provided at their discretion and their usage quota needs to be complied. The...

Full description

Saved in:

Bibliographic Details
Main Author:	Lim, Ivan Wei Jie.
Other Authors:	School of Computer Engineering
Format:	Final Year Project
Language:	English
Published:	2012
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
Online Access:	http://hdl.handle.net/10356/48449
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-48449
record_format	dspace
spelling	sg-ntu-dr.10356-484492023-03-03T20:50:41Z Parallel social network crawler system Lim, Ivan Wei Jie. School of Computer Engineering Cheng Sheung Chak James DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems Crawling social network data can uncover interesting phenomena for a variety of usage. However it is also generally sluggish due to the fact that it requires 3rd party services over a competitive network. These services are provided at their discretion and their usage quota needs to be complied. Therefore, it resulted in the need to accelerate the retrieval process, which is the objective of this project, to exploit parallelism so as to speed up the crawling procedure. The nature of social network data looks very much like a graph. Hence, the Breadth-First Search (BFS) graph traversal technique is revisited to explore for improvements on crawling operations. This project has chosen Google’s social networking platform called Google+ and experimented parallel crawling method based on BFS to increase throughput. The implementation of the experimental system has performed reasonably well over the naive crawling approach, in light of external limitations like Google’s courtesy usage quota of their services. The system was able fetch more data in the same or even shorter amount of time, therefore, increasing efficiency by a few folds. Although the project demonstrated the speed up of the crawling process, there are still rooms for improvement to further scale up the entire job. Using this as a basis, more concepts can still be used to enhance the efficiency of the system. Bachelor of Engineering (Computer Science) 2012-04-24T01:44:47Z 2012-04-24T01:44:47Z 2012 2012 Final Year Project (FYP) http://hdl.handle.net/10356/48449 en Nanyang Technological University 46 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Computer system implementation DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems Lim, Ivan Wei Jie. Parallel social network crawler system
description	Crawling social network data can uncover interesting phenomena for a variety of usage. However it is also generally sluggish due to the fact that it requires 3rd party services over a competitive network. These services are provided at their discretion and their usage quota needs to be complied. Therefore, it resulted in the need to accelerate the retrieval process, which is the objective of this project, to exploit parallelism so as to speed up the crawling procedure. The nature of social network data looks very much like a graph. Hence, the Breadth-First Search (BFS) graph traversal technique is revisited to explore for improvements on crawling operations. This project has chosen Google’s social networking platform called Google+ and experimented parallel crawling method based on BFS to increase throughput. The implementation of the experimental system has performed reasonably well over the naive crawling approach, in light of external limitations like Google’s courtesy usage quota of their services. The system was able fetch more data in the same or even shorter amount of time, therefore, increasing efficiency by a few folds. Although the project demonstrated the speed up of the crawling process, there are still rooms for improvement to further scale up the entire job. Using this as a basis, more concepts can still be used to enhance the efficiency of the system.
author2	School of Computer Engineering
author_facet	School of Computer Engineering Lim, Ivan Wei Jie.
format	Final Year Project
author	Lim, Ivan Wei Jie.
author_sort	Lim, Ivan Wei Jie.
title	Parallel social network crawler system
title_short	Parallel social network crawler system
title_full	Parallel social network crawler system
title_fullStr	Parallel social network crawler system
title_full_unstemmed	Parallel social network crawler system
title_sort	parallel social network crawler system
publishDate	2012
url	http://hdl.handle.net/10356/48449
_version_	1759854226564972544

Parallel social network crawler system

Similar Items