Big data processing on educational data mining using pyspark with jupyter notebook

The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer...

Full description

Saved in:
Bibliographic Details
Main Author: Ravichandran, Vinitha
Format: Thesis
Language:English
Published: 2018
Online Access:http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf
http://eprints.utm.my/id/eprint/81375/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:119718
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
id my.utm.81375
record_format eprints
spelling my.utm.813752019-08-23T04:06:50Z http://eprints.utm.my/id/eprint/81375/ Big data processing on educational data mining using pyspark with jupyter notebook Ravichandran, Vinitha The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer Aided Learning (CAL) system due to the massive information or data generated by the system. This leads to the rapid development of data mining in education denote as Educational Data Mining (EDM). The abundance of data collected by the system can be used to analyse, predict and solve many societal issues in the education field such as improve the quality of education, predict as well as monitor educational outcomes. Effective analysing or predicting the future growth of students’ performance can make the Computer Aided Learning (CAL) system a better platform for learning compared to traditional learning. Machine learning techniques were used to get reliable and accurate prediction on students’ performance. Apache Hadoop has been the backbone for big data technology until the emergence of Apache Spark. However, only several researches are done on EDM using Apache Spark. In this dissertation, PySpark was be integrated with Jupyter Notebook to perform EDM on Educational Process Mining (EPM) data set. The Spark MLlib was used to compare four classification algorithms such as Logistic Regression, Naïve Bayes, Decision Tree and Random Forest to deal with EPM data set. Random Forest classifier outperformed other classifiers in Accuracy, Area Under the Precision-Recall(PR) and Area Under the Receiver Operating Characteristic (ROC) although with slightly slower Execution Time in this study. Random Forest classifier are the best classifier when dealing with EDM. 2018 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf Ravichandran, Vinitha (2018) Big data processing on educational data mining using pyspark with jupyter notebook. Masters thesis, Universiti Teknologi Malaysia. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:119718
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
description The rapid advancement of the information technology brings new challenges and put new demands on our education system. The process of teaching and learning have moved from classroom to Computer Aided Learning (CAL) system. Big data technology and machine learning plays an important role in Computer Aided Learning (CAL) system due to the massive information or data generated by the system. This leads to the rapid development of data mining in education denote as Educational Data Mining (EDM). The abundance of data collected by the system can be used to analyse, predict and solve many societal issues in the education field such as improve the quality of education, predict as well as monitor educational outcomes. Effective analysing or predicting the future growth of students’ performance can make the Computer Aided Learning (CAL) system a better platform for learning compared to traditional learning. Machine learning techniques were used to get reliable and accurate prediction on students’ performance. Apache Hadoop has been the backbone for big data technology until the emergence of Apache Spark. However, only several researches are done on EDM using Apache Spark. In this dissertation, PySpark was be integrated with Jupyter Notebook to perform EDM on Educational Process Mining (EPM) data set. The Spark MLlib was used to compare four classification algorithms such as Logistic Regression, Naïve Bayes, Decision Tree and Random Forest to deal with EPM data set. Random Forest classifier outperformed other classifiers in Accuracy, Area Under the Precision-Recall(PR) and Area Under the Receiver Operating Characteristic (ROC) although with slightly slower Execution Time in this study. Random Forest classifier are the best classifier when dealing with EDM.
format Thesis
author Ravichandran, Vinitha
spellingShingle Ravichandran, Vinitha
Big data processing on educational data mining using pyspark with jupyter notebook
author_facet Ravichandran, Vinitha
author_sort Ravichandran, Vinitha
title Big data processing on educational data mining using pyspark with jupyter notebook
title_short Big data processing on educational data mining using pyspark with jupyter notebook
title_full Big data processing on educational data mining using pyspark with jupyter notebook
title_fullStr Big data processing on educational data mining using pyspark with jupyter notebook
title_full_unstemmed Big data processing on educational data mining using pyspark with jupyter notebook
title_sort big data processing on educational data mining using pyspark with jupyter notebook
publishDate 2018
url http://eprints.utm.my/id/eprint/81375/1/VinithaRavichandranMFC2018.pdf
http://eprints.utm.my/id/eprint/81375/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:119718
_version_ 1643658691575021568