Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining
Among the problems raised in the data mining area, the class imbalance is a well-known issue that always occurs. Many researchers studied this issue in several fields using three commonly used techniques: sampling, ensemble, or cost-sensitive learning. However, such studies are still new in educatio...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/93715/1/HasnizaHassan2020_ImprovedStudentsPerformancePrediction.pdf http://eprints.utm.my/id/eprint/93715/ http://dx.doi.org/10.1088/1742-6596/1529/5/052041 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
Language: | English |
id |
my.utm.93715 |
---|---|
record_format |
eprints |
spelling |
my.utm.937152021-12-31T08:28:30Z http://eprints.utm.my/id/eprint/93715/ Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining Hassan, H. Ahmad, N. B. Anuar, S. QC Physics Among the problems raised in the data mining area, the class imbalance is a well-known issue that always occurs. Many researchers studied this issue in several fields using three commonly used techniques: sampling, ensemble, or cost-sensitive learning. However, such studies are still new in education domains. This problem always related to the quality of data that gives the most impact to form an accurate prediction result. Many previous studies focus on binary imbalance classification problems instead of the multi-class imbalance problem in education data. This study used 4413 student instances of two datasets; students' information system and e-learning from the Faculty of Engineering in a Malaysia university for First Semester 2017/2018. Three sampling categories utilized in this study are oversampling techniques, undersampling techniques, and hybrid techniques. The research empirically analyzes five types of ensemble classifiers and seven sampling techniques. The experimental results show a hybrid technique ROS with AdaBoost produces the most excellent performance compared to the other benchmark techniques. SMOTEENN technique with ensembles classifiers consistently produces high results. This technique has great potential in improving the students' performance prediction model. 2020 Conference or Workshop Item PeerReviewed application/pdf en http://eprints.utm.my/id/eprint/93715/1/HasnizaHassan2020_ImprovedStudentsPerformancePrediction.pdf Hassan, H. and Ahmad, N. B. and Anuar, S. (2020) Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining. In: 2nd Joint International Conference on Emerging Computing Technology and Sports, JICETS 2019, 25-27 Nov 2019, Bandung, Indonesia. http://dx.doi.org/10.1088/1742-6596/1529/5/052041 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
language |
English |
topic |
QC Physics |
spellingShingle |
QC Physics Hassan, H. Ahmad, N. B. Anuar, S. Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining |
description |
Among the problems raised in the data mining area, the class imbalance is a well-known issue that always occurs. Many researchers studied this issue in several fields using three commonly used techniques: sampling, ensemble, or cost-sensitive learning. However, such studies are still new in education domains. This problem always related to the quality of data that gives the most impact to form an accurate prediction result. Many previous studies focus on binary imbalance classification problems instead of the multi-class imbalance problem in education data. This study used 4413 student instances of two datasets; students' information system and e-learning from the Faculty of Engineering in a Malaysia university for First Semester 2017/2018. Three sampling categories utilized in this study are oversampling techniques, undersampling techniques, and hybrid techniques. The research empirically analyzes five types of ensemble classifiers and seven sampling techniques. The experimental results show a hybrid technique ROS with AdaBoost produces the most excellent performance compared to the other benchmark techniques. SMOTEENN technique with ensembles classifiers consistently produces high results. This technique has great potential in improving the students' performance prediction model. |
format |
Conference or Workshop Item |
author |
Hassan, H. Ahmad, N. B. Anuar, S. |
author_facet |
Hassan, H. Ahmad, N. B. Anuar, S. |
author_sort |
Hassan, H. |
title |
Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining |
title_short |
Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining |
title_full |
Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining |
title_fullStr |
Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining |
title_full_unstemmed |
Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining |
title_sort |
improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining |
publishDate |
2020 |
url |
http://eprints.utm.my/id/eprint/93715/1/HasnizaHassan2020_ImprovedStudentsPerformancePrediction.pdf http://eprints.utm.my/id/eprint/93715/ http://dx.doi.org/10.1088/1742-6596/1529/5/052041 |
_version_ |
1720980114513068032 |