Development of a distributed crawler to collect online game playing traces

Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged....

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Yuance
Other Authors: Tang Xueyan
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/77003
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Over the years, the form of computer games has been evolving. From having to play alone or playing face to face, players from across the world nowadays can easily play in the same game through the internet. As the players involved in a game become more and more, an accompanying problem has emerged. Due to a developing multi-player game's nature, the balance of the game, tuned by the various game parameters, is necessary to be updated from time to time. That is where the demand for the data collection needs comes in. In order to present a better platform for the gamers, data generated from these online games have become an intriguing data source to be analysed. One of the methods to extract such data is using a web crawler. However, due to the enormous amount of data stored in the cloud database, a normal web crawler would have one or more than one shortage in areas such as scalability, portability performance, monitoring, fault tolerance, etc. Therefore, this FYP project focused on the development of a distributed crawler for collection of online game playing traces, for the according data research and analysis jobs to be carried out. A general-purpose API high-performance crawler is a good solution. In this project, a distributed system with components including Python Scrapy, MongoDB Cluster, Redis DB and Docker was designed and implemented from scratch. The innovation of overriding Scrapy Framework from a single server crawler to a distributed crawler by Redis Server as a message shared queue. The master-slave method, data clustering, and docker swarm are all in the tech stack of this project. At last, system tests including operation evaluation, fault tolerance and load test were carried out to verify the system. To have further exploration of this project, a general-purpose distributed API crawler framework is a goal, letting the user define their own crawling logic but keeping all features of this system including highly portable, automatic failover, load balancing, high availability and scalability.