Neural machine translation in grammar error correction

Grammar Error Correction (GEC) is the task of detecting and correcting grammatical errors in text written by non-native English writers. While traditional approaches with separate classifiers for different error types can achieve high precision, they cannot give the correction to errors based on the...

Full description

Saved in:
Bibliographic Details
Main Author: Pham, Vu Tuan
Other Authors: Hui Siu Cheung
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74074
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Grammar Error Correction (GEC) is the task of detecting and correcting grammatical errors in text written by non-native English writers. While traditional approaches with separate classifiers for different error types can achieve high precision, they cannot give the correction to errors based on the sentence context, or handle errors such as non-idiomatic phrasing or word redundancy. This project studies the use of neural machine translation (NMT) for the GEC problem. This project reproduces two existing models using NMT: word-based machine translation and character-based machine translation. The core component of NMT is an encoder-decoder recurrent neural network with an attention mechanism. Though word-based machine translation is more popular and applied in many problems solvable by NMT such as translation or summarization, word-based approach may encounter the problem of out-of-vocabulary (OOV) words. On the other hand, by investigating at character level, character-based NMT is able to handle OOV words because of small vocabulary size. Evaluation of this study is performed on Lang-8 development set, JFLEG corpus and common English grammar errors. A web prototype system is also developed to demonstrate the working of the model.