Big data analytics

With the ease of access to connected devices and online services, data of a wide variety are constantly being collected by various service providers. These data can be used for trend-finding and the prediction of future values, such outcomes having an importance in optimization for a variety of indu...

Full description

Saved in:
Bibliographic Details
Main Author: Chua, Zhen Hong
Other Authors: Bi Guoan
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/78353
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the ease of access to connected devices and online services, data of a wide variety are constantly being collected by various service providers. These data can be used for trend-finding and the prediction of future values, such outcomes having an importance in optimization for a variety of industrial, commercial and even consumer processes. With the widespread availability of highly capable computing systems and programming tools, resource-intensive tasks like the implementation of predictive machine learning is now possible at low cost for a determined user. The objective of this project is to produce a machine learning-based process, capable of predicting a numerical output based upon a set of mixed-type input data. This process is implemented with open-source programming tools. Furthermore, the project also seeks to predict the relative importance of the different data features. In this project, we have developed a machine-learning process capable of predicting a numerical output with up to 0.77 explained variance. The process encompasses the entire data analysis procedure, from data importation, data pre-processing, hyperparameter optimization and prediction. The process developed shows that a functional, and reasonably accurate data analysis model, can be produced using open-source software. Using a variety of machine-learning algorithms, the project also shows the relative accuracy of, and time taken by the different models in producing a predicted output.