Improved students' performance prediction for multi-class imbalanced problems using hybrid and ensemble approach in educational data mining

Among the problems raised in the data mining area, the class imbalance is a well-known issue that always occurs. Many researchers studied this issue in several fields using three commonly used techniques: sampling, ensemble, or cost-sensitive learning. However, such studies are still new in educatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Hassan, H., Ahmad, N. B., Anuar, S.
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/93715/1/HasnizaHassan2020_ImprovedStudentsPerformancePrediction.pdf
http://eprints.utm.my/id/eprint/93715/
http://dx.doi.org/10.1088/1742-6596/1529/5/052041
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
Description
Summary:Among the problems raised in the data mining area, the class imbalance is a well-known issue that always occurs. Many researchers studied this issue in several fields using three commonly used techniques: sampling, ensemble, or cost-sensitive learning. However, such studies are still new in education domains. This problem always related to the quality of data that gives the most impact to form an accurate prediction result. Many previous studies focus on binary imbalance classification problems instead of the multi-class imbalance problem in education data. This study used 4413 student instances of two datasets; students' information system and e-learning from the Faculty of Engineering in a Malaysia university for First Semester 2017/2018. Three sampling categories utilized in this study are oversampling techniques, undersampling techniques, and hybrid techniques. The research empirically analyzes five types of ensemble classifiers and seven sampling techniques. The experimental results show a hybrid technique ROS with AdaBoost produces the most excellent performance compared to the other benchmark techniques. SMOTEENN technique with ensembles classifiers consistently produces high results. This technique has great potential in improving the students' performance prediction model.