Detecting player's position using in-game statistics : a machine learning approach

Background: Technology is everchanging in the realm of football data analytics. One domain which potentially requires a more data-driven focus is player recruitment. Till date, there is little evidence to suggest an existence of any classification model that can be used to identify and recruit playe...

Full description

Saved in:
Bibliographic Details
Main Author: Muhammad Aqmar Naqib Masrani
Other Authors: -
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/153162
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Background: Technology is everchanging in the realm of football data analytics. One domain which potentially requires a more data-driven focus is player recruitment. Till date, there is little evidence to suggest an existence of any classification model that can be used to identify and recruit players. Aim: (1) To determine which technical performance statistics hold greater importance when distinguishing the different playing positions in football. (2) To develop and validate a machine learning model which can accurately classify the playing positions of players using the technical performance statistics. Method: Season-long observations of performance statistics of players in the English Premier League (EPL) and German Bundesliga from 2014-2021 were collected. Discriminant analysis was performed on the EPL dataset to determine the significant performance statistics that had the greatest ability in distinguishing the playing positions. The performances of five classification models, after being trained and tested against the EPL dataset, would be evaluated. The model with the highest accuracy would be validated by testing against the Bundesliga dataset. Results: Thirty-four technical performance statistics were found significant in distinguishing between positions using a discriminant analysis. The extreme gradient boosting (XGB) model achieved the highest classification accuracy (70.4%) among the classification models that were tested against the EPL dataset. The XGB model provided a moderately high ability of classification when tested using the Bundesliga dataset (63.9%). Conclusion: The usage of technical performance statistics and the XGB model is a practical and valid tool for coaches and scouts to use when identifying and recruiting players.