Forecasting sport-matches through data mining

Odds of Forecasting a perfect bracket for sports matches are astronomical and through participating in sports forecasting competitions, we can see how well machine learning, statistical techniques, feature engineering and ensemble learning can improve the predictions and in the process might helped...

Full description

Saved in:
Bibliographic Details
Main Author: Ng, Jun Xuan
Other Authors: Pan, Sinno Jialin
Format: Final Year Project
Language:English
Published: 2016
Subjects:
Online Access:http://hdl.handle.net/10356/66752
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Odds of Forecasting a perfect bracket for sports matches are astronomical and through participating in sports forecasting competitions, we can see how well machine learning, statistical techniques, feature engineering and ensemble learning can improve the predictions and in the process might helped to develop new scientific discoveries and business models by implementing code for the predictions of the outcome of matches. The purpose of this project is to explore different machine learning algorithms, data mining techniques and etc. and at the same time to achieve a favorable score in the competition’s leaderboard. National College Basketball prediction competition which is named as “March Machine Learning 2014” hosted by Kaggle will be discussed in this document. Before implementing the code, data are to be pre-processed (e.g. analysis of data provided are performed to remove unwanted data) and etc. RStudio with R programming language is used to perform all necessary tasks in this project. After implementing the script for prediction, results shown that standalone algorithm (Logistic Regression and XGBoost) works best for the competition and score submitted was scored at 2nd position in the leaderboard. Using ensemble learning on both algorithms as mentioned earlier on, further enhance the accuracy of the results rather than using standalone algorithm.