Development of a distributed crawler to collect online game playing traces

Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged....

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Yuance
Other Authors: Tang Xueyan
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/77003
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-77003
record_format dspace
spelling sg-ntu-dr.10356-770032023-03-03T20:39:06Z Development of a distributed crawler to collect online game playing traces Zhang, Yuance Tang Xueyan School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged. Due to a developing multi-player game's nature, the balance of the game, tuned by the various game parameters, is necessary to be updated from time to time. That is where the demand for the data collection needs comes in. In order to present a better platform for the gamers, data generated from these online games have become an intriguing data source to be analysed. One of the methods to extract such data is using a web crawler. However, due to the enormous amount of data stored in the cloud database, a normal web crawler would have one or more than one shortage in areas such as scalability, portability performance, monitoring, fault tolerance, etc. Therefore, this FYP project focused on the development of a distributed crawler for collection of online game playing traces, for the according data research and analysis jobs to be carried out. A general-purpose API high-performance crawler is a good solution. In this project, a distributed system with components including Python Scrapy, MongoDB Cluster, Redis DB and Docker was designed and implemented from scratch. The innovation of overriding Scrapy Framework from a single server crawler to a distributed crawler by Redis Server as a message shared queue. The master-slave method, data clustering, and docker swarm are all in the tech stack of this project. At last, system tests including operation evaluation, fault tolerance and load test were carried out to verify the system. To have further exploration of this project, a general-purpose distributed API crawler framework is a goal, letting the user define their own crawling logic but keeping all features of this system including highly portable, automatic failover, load balancing, high availability and scalability. Bachelor of Engineering (Computer Science) 2019-04-30T07:32:10Z 2019-04-30T07:32:10Z 2019 Final Year Project (FYP) http://hdl.handle.net/10356/77003 en Nanyang Technological University 57 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems::Information storage and retrieval
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems
Zhang, Yuance
Development of a distributed crawler to collect online game playing traces
description Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged. Due to a developing multi-player game's nature, the balance of the game, tuned by the various game parameters, is necessary to be updated from time to time. That is where the demand for the data collection needs comes in. In order to present a better platform for the gamers, data generated from these online games have become an intriguing data source to be analysed. One of the methods to extract such data is using a web crawler. However, due to the enormous amount of data stored in the cloud database, a normal web crawler would have one or more than one shortage in areas such as scalability, portability performance, monitoring, fault tolerance, etc. Therefore, this FYP project focused on the development of a distributed crawler for collection of online game playing traces, for the according data research and analysis jobs to be carried out. A general-purpose API high-performance crawler is a good solution. In this project, a distributed system with components including Python Scrapy, MongoDB Cluster, Redis DB and Docker was designed and implemented from scratch. The innovation of overriding Scrapy Framework from a single server crawler to a distributed crawler by Redis Server as a message shared queue. The master-slave method, data clustering, and docker swarm are all in the tech stack of this project. At last, system tests including operation evaluation, fault tolerance and load test were carried out to verify the system. To have further exploration of this project, a general-purpose distributed API crawler framework is a goal, letting the user define their own crawling logic but keeping all features of this system including highly portable, automatic failover, load balancing, high availability and scalability.
author2 Tang Xueyan
author_facet Tang Xueyan
Zhang, Yuance
format Final Year Project
author Zhang, Yuance
author_sort Zhang, Yuance
title Development of a distributed crawler to collect online game playing traces
title_short Development of a distributed crawler to collect online game playing traces
title_full Development of a distributed crawler to collect online game playing traces
title_fullStr Development of a distributed crawler to collect online game playing traces
title_full_unstemmed Development of a distributed crawler to collect online game playing traces
title_sort development of a distributed crawler to collect online game playing traces
publishDate 2019
url http://hdl.handle.net/10356/77003
_version_ 1759854611129171968