Development of a distributed crawler to collect online game playing traces

Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged....

Full description

Saved in:

Bibliographic Details
Main Author:	Zhang, Yuance
Other Authors:	Tang Xueyan
Format:	Final Year Project
Language:	English
Published:	2019
Subjects:	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Online Access:	http://hdl.handle.net/10356/77003
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-77003
record_format	dspace
spelling	sg-ntu-dr.10356-770032023-03-03T20:39:06Z Development of a distributed crawler to collect online game playing traces Zhang, Yuance Tang Xueyan School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged. Due to a developing multi-player game's nature, the balance of the game, tuned by the various game parameters, is necessary to be updated from time to time. That is where the demand for the data collection needs comes in. In order to present a better platform for the gamers, data generated from these online games have become an intriguing data source to be analysed. One of the methods to extract such data is using a web crawler. However, due to the enormous amount of data stored in the cloud database, a normal web crawler would have one or more than one shortage in areas such as scalability, portability performance, monitoring, fault tolerance, etc. Therefore, this FYP project focused on the development of a distributed crawler for collection of online game playing traces, for the according data research and analysis jobs to be carried out. A general-purpose API high-performance crawler is a good solution. In this project, a distributed system with components including Python Scrapy, MongoDB Cluster, Redis DB and Docker was designed and implemented from scratch. The innovation of overriding Scrapy Framework from a single server crawler to a distributed crawler by Redis Server as a message shared queue. The master-slave method, data clustering, and docker swarm are all in the tech stack of this project. At last, system tests including operation evaluation, fault tolerance and load test were carried out to verify the system. To have further exploration of this project, a general-purpose distributed API crawler framework is a goal, letting the user define their own crawling logic but keeping all features of this system including highly portable, automatic failover, load balancing, high availability and scalability. Bachelor of Engineering (Computer Science) 2019-04-30T07:32:10Z 2019-04-30T07:32:10Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/77003 en Nanyang Technological University 57 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
spellingShingle	DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Zhang, Yuance Development of a distributed crawler to collect online game playing traces
description	Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged. Due to a developing multi-player game's nature, the balance of the game, tuned by the various game parameters, is necessary to be updated from time to time. That is where the demand for the data collection needs comes in. In order to present a better platform for the gamers, data generated from these online games have become an intriguing data source to be analysed. One of the methods to extract such data is using a web crawler. However, due to the enormous amount of data stored in the cloud database, a normal web crawler would have one or more than one shortage in areas such as scalability, portability performance, monitoring, fault tolerance, etc. Therefore, this FYP project focused on the development of a distributed crawler for collection of online game playing traces, for the according data research and analysis jobs to be carried out. A general-purpose API high-performance crawler is a good solution. In this project, a distributed system with components including Python Scrapy, MongoDB Cluster, Redis DB and Docker was designed and implemented from scratch. The innovation of overriding Scrapy Framework from a single server crawler to a distributed crawler by Redis Server as a message shared queue. The master-slave method, data clustering, and docker swarm are all in the tech stack of this project. At last, system tests including operation evaluation, fault tolerance and load test were carried out to verify the system. To have further exploration of this project, a general-purpose distributed API crawler framework is a goal, letting the user define their own crawling logic but keeping all features of this system including highly portable, automatic failover, load balancing, high availability and scalability.
author2	Tang Xueyan
author_facet	Tang Xueyan Zhang, Yuance
format	Final Year Project
author	Zhang, Yuance
author_sort	Zhang, Yuance
title	Development of a distributed crawler to collect online game playing traces
title_short	Development of a distributed crawler to collect online game playing traces
title_full	Development of a distributed crawler to collect online game playing traces
title_fullStr	Development of a distributed crawler to collect online game playing traces
title_full_unstemmed	Development of a distributed crawler to collect online game playing traces
title_sort	development of a distributed crawler to collect online game playing traces
publishDate	2019
url	http://hdl.handle.net/10356/77003
_version_	1759854611129171968

Development of a distributed crawler to collect online game playing traces

Similar Items