Predicting students’ STEM academic performance in Malaysian secondary schools using educational data mining

Data mining has been widely applied in educational area recently and it is commonly called Educational Data Mining (EDM). Via data and success analysis, schools may recognize and improve learning plans for students who fail to fulfil their needs. It is necessary in the era of big data to treat da...

Full description

Saved in:
Bibliographic Details
Main Author: Termedi @ Termiji, Mohammad Izzuan
Format: Thesis
Language:English
Published: 2023
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/111826/1/FPP%202023%204%20IR.pdf
http://psasir.upm.edu.my/id/eprint/111826/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
Description
Summary:Data mining has been widely applied in educational area recently and it is commonly called Educational Data Mining (EDM). Via data and success analysis, schools may recognize and improve learning plans for students who fail to fulfil their needs. It is necessary in the era of big data to treat data as an opportunity for schools to become a data-driven organization. However, predicting students’ performance in Malaysian setting of diverse background and from huge dataset to inform secondary students’ performance remain unclear. The aim of the study is primarily to define the input and environment variables that better predict students' progress and to establish a methodological approach using students’ data which can be used by academics, schools and the educational ministry to assess students’ performance. Data mining is described in the CRISP-DM conceptual model in this research. CRISP is a cross-industry standard data mining process which consists of six stages of the normal lifecycle of a CRISP-DM project. Design and development research (DDR) are chosen to be the conducted approach for this analysis. It proceeds through three phases of Need Analysis, Development of the Model and Evaluation of the Model. Four different data mining classification algorithms which are Random Forest, PART, J48 and Naive Bayes will be used on the dataset. This study is expected to investigate the process through utilize classification to help to predict students’ performance. The target population for this study was all upper secondary students in Malaysia taking Science stream. Ten-fold cross validation uses a streamlined random sample technique to separate the entire dataset into 10 reciprocal sets. The data collected in this study are from two sources that consist of twenty-one attributes. The first one is collected from the Education Repository in the Ministry of Education mainly APDM (Aplikasi Pangkalan Data Murid) and second from SAPS (Sistem Aplikasi Peperiksaan Sekolah). WEKA software will be used as a data mining software for all analyses. Findings indicated that the high influential attributes in predicting students’ academic performance are district, type of school, dual language program (DLP) status, religion, dormitory, nationality, guardian’s job, guardian’s salary group, and mid-year exam results. The results showed that each of the four classification algorithms have an average prediction accuracy of more than 70%, and J48 outperforms other classifiers based on accuracy and classifier errors. This study has provided a methodological approach that will inform the ministry of education on how to predict students’ performance using data available in the Education Data Repository. Then, this study was significant because the results of this study can help inform the ministry of education, teachers, and parents with valuable insight on key factors that can be used to help improve academic performance as well as students’ success. Finally, this study can identify academically-at-risk students and develop early intervention.