Forecasting sport-matches through data mining

Odds of Forecasting a perfect bracket for sports matches are astronomical and through participating in sports forecasting competitions, we can see how well machine learning, statistical techniques, feature engineering and ensemble learning can improve the predictions and in the process might helped...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Jun Xuan
Other Authors: Pan, Sinno Jialin
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66752
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-66752
record_format dspace
spelling sg-ntu-dr.10356-667522023-03-03T20:31:26Z Forecasting sport-matches through data mining Ng, Jun Xuan Pan, Sinno Jialin School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Data::Coding and information theory Odds of Forecasting a perfect bracket for sports matches are astronomical and through participating in sports forecasting competitions, we can see how well machine learning, statistical techniques, feature engineering and ensemble learning can improve the predictions and in the process might helped to develop new scientific discoveries and business models by implementing code for the predictions of the outcome of matches. The purpose of this project is to explore different machine learning algorithms, data mining techniques and etc. and at the same time to achieve a favorable score in the competition’s leaderboard. National College Basketball prediction competition which is named as “March Machine Learning 2014” hosted by Kaggle will be discussed in this document. Before implementing the code, data are to be pre-processed (e.g. analysis of data provided are performed to remove unwanted data) and etc. RStudio with R programming language is used to perform all necessary tasks in this project. After implementing the script for prediction, results shown that standalone algorithm (Logistic Regression and XGBoost) works best for the competition and score submitted was scored at 2nd position in the leaderboard. Using ensemble learning on both algorithms as mentioned earlier on, further enhance the accuracy of the results rather than using standalone algorithm. Bachelor of Engineering (Computer Science) 2016-04-25T03:46:53Z 2016-04-25T03:46:53Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/66752 en Nanyang Technological University 40 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Data::Coding and information theory
spellingShingle DRNTU::Engineering::Computer science and engineering::Data::Coding and information theory
Ng, Jun Xuan
Forecasting sport-matches through data mining
description Odds of Forecasting a perfect bracket for sports matches are astronomical and through participating in sports forecasting competitions, we can see how well machine learning, statistical techniques, feature engineering and ensemble learning can improve the predictions and in the process might helped to develop new scientific discoveries and business models by implementing code for the predictions of the outcome of matches. The purpose of this project is to explore different machine learning algorithms, data mining techniques and etc. and at the same time to achieve a favorable score in the competition’s leaderboard. National College Basketball prediction competition which is named as “March Machine Learning 2014” hosted by Kaggle will be discussed in this document. Before implementing the code, data are to be pre-processed (e.g. analysis of data provided are performed to remove unwanted data) and etc. RStudio with R programming language is used to perform all necessary tasks in this project. After implementing the script for prediction, results shown that standalone algorithm (Logistic Regression and XGBoost) works best for the competition and score submitted was scored at 2nd position in the leaderboard. Using ensemble learning on both algorithms as mentioned earlier on, further enhance the accuracy of the results rather than using standalone algorithm.
author2 Pan, Sinno Jialin
author_facet Pan, Sinno Jialin
Ng, Jun Xuan
format Final Year Project
author Ng, Jun Xuan
author_sort Ng, Jun Xuan
title Forecasting sport-matches through data mining
title_short Forecasting sport-matches through data mining
title_full Forecasting sport-matches through data mining
title_fullStr Forecasting sport-matches through data mining
title_full_unstemmed Forecasting sport-matches through data mining
title_sort forecasting sport-matches through data mining
publishDate 2016
url http://hdl.handle.net/10356/66752
_version_ 1759854572737658880