English essay scoring
Essays are structured and constructed responses that provide a comprehensive insight into a person’s grasp of a particular topic. It is for this reason that essays are a highly preferred mode of assessment by educational institutions and corporate organizations alike. The underlying issue, due to it...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/59133 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Essays are structured and constructed responses that provide a comprehensive insight into a person’s grasp of a particular topic. It is for this reason that essays are a highly preferred mode of assessment by educational institutions and corporate organizations alike. The underlying issue, due to its time and cost ineffectiveness, is the need to hire human graders. Although several automated scoring techniques exist, most rely on human judgement for verification. Computers do not always give a high accuracy rate while providing scores, as some traits and qualities of essays require human understanding to decipher.
This paper explores the process of automatically scoring essays. It discusses the features that differentiate high and low quality essays by analyzing scored writing from the ICAME, KAGGLE, and WECCL20 corpora. Good and bad essays were found to differ in word choices, style, grammar, word count, sentence structure, punctuations, and spelling. Considering feasibility and resource constraints, a subset was chosen and used as features for machine learning.
Three different approaches – Naïve Bayes Multinomial, Naïve Bayes Bernoulli, and Extreme Learning Machines – were used to train the classifier for binary categorization; essays were initially separated into good or bad. ELM was found to be the most accurate, and was thus used to combine all of the individual components. This resulted in an overall accuracy of 75.82%. The scores obtained from the classifier’s decision function were normalized to the desired scoring range. The difference between the actual scores and the expected scores were analyzed. The results showed that the approach worked better for narrow scoring margins (bands) such as 0-4, as compared to wider ones such as 0-30.
The report concludes with a workable scoring model that takes a range of factors into account. However, additional features are required. Limitations are clearly stated, and further suggestions for improvement are also provided. |
---|