A crowdsourcing-based incremental learning framework for automated essays scoring

Automated Essay Scoring (AES) is a challenging topic in Natural Language Processing. Recently, deep learning models have achieved remarkable performance for the AES task. However, applying deep learning models to the AES system in practice is expensive when both data collection and model training ar...

Full description

Saved in:
Bibliographic Details
Main Authors: Bai, Huanyu, Hui, Siu Cheung
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173026
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Automated Essay Scoring (AES) is a challenging topic in Natural Language Processing. Recently, deep learning models have achieved remarkable performance for the AES task. However, applying deep learning models to the AES system in practice is expensive when both data collection and model training are taken into consideration. This paper aims to tackle this problem by proposing the Crowdsourcing-based Automated Essay Scoring (CAES) framework. The proposed framework gradually collects data through crowdsourcing and incrementally trains the AES models. In particular, we propose the Incremental Learning with Dynamic Exemplar Herding (ILDEH) approach to simultaneously tackle catastrophic forgetting and concept drift. The proposed approach dynamically updates the exemplar set by the Dynamic Exemplar Herding algorithm to obtain the best approximation of the overall data distribution and selectively apply knowledge distillation on the model outputs by Linear Outlier Suppression loss to retain the learned knowledge. Moreover, we use a lightweight AES model for effective and efficient essay scoring. The experimental results show that our proposed ILDEH approach outperforms other strong baseline approaches for the AES task. Moreover, the CAES framework is able to steadily improve the AES performance in the crowdsourcing environment with only 10.6% training time of the conventional approach. Further analysis shows that one single CPU server can support daily updates of more than 300 AES models, which is sufficient for most practical AES systems.