Genetic algorithm based feature selection with ensemble methods for student academic performance prediction

Student academic performance is an important factor that affect the achievement of an educational institution. Educational Data Mining (EDM) is a data mining process that is applied to explore educational data that can produce information related to student academic performance. The knowledge produc...

Full description

Saved in:
Bibliographic Details
Main Authors: Al Farissi, Al Farissi, Mohamed Dahlan, Halina, Samsuryadi, Samsuryadi
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:http://eprints.utm.my/id/eprint/92480/1/HalinaMohamedDahlan2020_GeneticAlgorithmBasedFeatureSelection.pdf
http://eprints.utm.my/id/eprint/92480/
http://dx.doi.org/10.1088/1742-6596/1500/1/012110
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
Description
Summary:Student academic performance is an important factor that affect the achievement of an educational institution. Educational Data Mining (EDM) is a data mining process that is applied to explore educational data that can produce information related to student academic performance. The knowledge produced from the data mining process is used by the educational institutions to improve their teaching processes, which aim to improve student academic performance results. In this paper, a method based on Genetic Algorithm (GA) feature selection technique with classification method is proposed in order to predict student academic performance. Almost all previous feature selection techniques apply local search technique throughout the process, so the optimal solution is quite difficult to achieve. Therefore, GA is apply as a technique of features selection with ensemble classification method in order to improve classification accuracy value of student academic performance prediction, as well as it can be used for datasets with high dimensional and imbalanced class. In this paper, the data used for experiments comes from Kaggle repository datasets which consists of three main categories: behaviour, academic, and demographic. The performances evaluation used to evaluate the proposed method is the Area Under the Curve (AUC). Based on the results obtained from the experiments, shows that the proposed method makes an impressive result in the predictions of student academic performance.