AUTOMATED ESSAY SCORING FOR ENGLISH IN CEFR LEVELS USING LSTM AND DISTILBERT EMBEDDINGS
The Common European Framework of Reference for Languages or often abbreviated as CEFR is an international standard that is currently used globally to measure language fluency. One of the abilities that makes up one’s language fluency is the writing skill which is measured through a written exam....
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76534 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The Common European Framework of Reference for Languages or often
abbreviated as CEFR is an international standard that is currently used globally to
measure language fluency. One of the abilities that makes up one’s language
fluency is the writing skill which is measured through a written exam. This research
is aiming to build an automatic essay scoring system in CEFR level utilizing
machine learning models.
The research was done by developing two machine learning models, namely
a LSTM based model and another where LSTM and DistilBERT is combined in
one pipeline. The training was done with practice texts data from the EFCAMDAT
open- ource corpus my EF English First and University of Cambridge. Before
training, hyperparameter tuning is done to obtain the best hyperparameters for each
model and it is done using the help of the Optuna framework. Training is then
carried out using the obtained hyperparameters and model performance are
measured using accuracy and F1-measure at each epoch.
After training was finished, the models were tested, and the final
performance which contains accuracy, F1-measure, and confusion matrix for each
model were obtained. Additionally, classification report consisting of precision and
recall for each class prediction results made by each model was obtained after
testing. Final testing results showed dan the model that combined produced the best
results in predicting English essay scores in CEFR levels. |
---|