Predicting students’ STEM academic performance in Malaysian secondary schools using educational data mining
Data mining has been widely applied in educational area recently and it is commonly called Educational Data Mining (EDM). Via data and success analysis, schools may recognize and improve learning plans for students who fail to fulfil their needs. It is necessary in the era of big data to treat da...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | http://psasir.upm.edu.my/id/eprint/111826/1/FPP%202023%204%20IR.pdf http://psasir.upm.edu.my/id/eprint/111826/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Putra Malaysia |
Language: | English |
Summary: | Data mining has been widely applied in educational area recently and it is
commonly called Educational Data Mining (EDM). Via data and success
analysis, schools may recognize and improve learning plans for students who
fail to fulfil their needs. It is necessary in the era of big data to treat data as an
opportunity for schools to become a data-driven organization. However,
predicting students’ performance in Malaysian setting of diverse background and
from huge dataset to inform secondary students’ performance remain unclear.
The aim of the study is primarily to define the input and environment variables
that better predict students' progress and to establish a methodological approach
using students’ data which can be used by academics, schools and the
educational ministry to assess students’ performance.
Data mining is described in the CRISP-DM conceptual model in this research.
CRISP is a cross-industry standard data mining process which consists of six
stages of the normal lifecycle of a CRISP-DM project.
Design and development research (DDR) are chosen to be the conducted
approach for this analysis. It proceeds through three phases of Need Analysis,
Development of the Model and Evaluation of the Model. Four different data
mining classification algorithms which are Random Forest, PART, J48 and Naive
Bayes will be used on the dataset. This study is expected to investigate the
process through utilize classification to help to predict students’ performance.
The target population for this study was all upper secondary students in Malaysia
taking Science stream. Ten-fold cross validation uses a streamlined random
sample technique to separate the entire dataset into 10 reciprocal sets. The data
collected in this study are from two sources that consist of twenty-one attributes.
The first one is collected from the Education Repository in the Ministry of
Education mainly APDM (Aplikasi Pangkalan Data Murid) and second from
SAPS (Sistem Aplikasi Peperiksaan Sekolah). WEKA software will be used as a
data mining software for all analyses.
Findings indicated that the high influential attributes in predicting students’
academic performance are district, type of school, dual language program (DLP)
status, religion, dormitory, nationality, guardian’s job, guardian’s salary group,
and mid-year exam results. The results showed that each of the four
classification algorithms have an average prediction accuracy of more than 70%,
and J48 outperforms other classifiers based on accuracy and classifier errors.
This study has provided a methodological approach that will inform the ministry
of education on how to predict students’ performance using data available in the
Education Data Repository. Then, this study was significant because the results
of this study can help inform the ministry of education, teachers, and parents with
valuable insight on key factors that can be used to help improve academic
performance as well as students’ success. Finally, this study can identify
academically-at-risk students and develop early intervention. |
---|