Prediction of learning outcomes via clickstream data using machine learning
This thesis addresses the identification of learning behaviors and the prediction of learning outcomes via interaction clickstream data. The challenges of analyzing these complex clickstream data include the balance between automaticity and interpretability and the effectiveness of prediction models...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/147631 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This thesis addresses the identification of learning behaviors and the prediction of learning outcomes via interaction clickstream data. The challenges of analyzing these complex clickstream data include the balance between automaticity and interpretability and the effectiveness of prediction models to model learning outcomes; both of which are necessary to inform instructional and policy changes.
For the identification of learning behaviors, the thesis focuses on the use of clustering frameworks to identify learner groups with different study habits. To identify lesson preparation preferences in a blended course, a clustering framework that employs Gaussian mixture model and a heuristic rule-based labeler is proposed. As will be shown in this thesis, this proposed algorithm is able to identify and differentiate normal and alternative preparation preferences. Also, as will be shown in the experiment, learners who preferred less common lesson preparation preferences were associated with higher cognitive abilities to transfer their knowledge onto unseen domains. The focus is then shifted to the identification of click-stream problem solving approaches. A sequential clustering framework that employs string metrics to differentiate pairs of sequences and a sequential pattern mining algorithm to summarize cluster members is proposed.
The proposed sequential clustering framework is able to identify action sequence archetypes that can predict student drop-out with an average F1 of 0.712 and AUC of 0.715. While the use of string metric to determine the similarity between pairs of sequences allows for accurate identification of problem-solving approaches, nuances between sequences that belong to the same cluster is summarized via the sequential pattern mining algorithm, thereby retaining the interpretability of its outcomes. Since these interactions were logged in an online elementary Mathematics course that taught students to solve arithmetic word problems via a series of steps, script-like and schematic variants of problem-solving approaches were analyzed. Learners who deviated away from the taught steps, which were expected of them, were also associated with higher persistence in their learning. The two experiments suggest that the greater autonomy in one's learning can be associated with better learning.
For the prediction of learning outcomes via interaction clickstream data, the use of deep learning techniques is proposed. To incorporate the balance between transient learning behaviors and persistent learner characteristics, a joint-space grade prediction model is proposed. As will be shown in the thesis, the use of persistent learner characteristics regularizes grade prediction performance. Performance of the proposed grade prediction performance is evaluated on a digital signal processing course. The proposed method is able to improve the RMSE score by up to 12% over baseline methods. Furthermore, the notion that interaction clickstream data is multi-valued is raised. Since the clickstream sequences are ordered lists of discrete actions, conventional data mining algorithms are unable to identify the multi-valued inputs. The proposed MV identification module that that employs the proposed Bayesian-regularized layer within a deep neural architecture is employed to determine the multi-valued characteristic of interaction clickstream data.The identification, and subsequent removal, of multi-valued inputs results in an improvement in grade prediction performance. The performance of the proposed MV identification module is verified on two interaction clickstream datasets. The proposed MV identification module is able to identify 32.3% more MV instances over brute force methods and, consequently, achieve improvement in grade prediction performance by up to 38.95%. |
---|