Using rich models of language in grammatical error detection
In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/155214 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable approach to build educational applications.
During the various chapters of this thesis, I will guide the reader through the entire process of creating a successful educational application that has benefited thousands of NTU students. To this end, I will start by describing the creation of two new learner corpora, one for English and one for Mandarin Chinese, through which I collected first-hand data about common errors NTU students make in these two languages. I will follow with a discussion of my contributions to ZHONG, an open source computational grammar of Mandarin Chinese using a theoretical framework known as Head-Driven Phrase Structure Grammar, with special emphasis on the design of special rules capable of transforming a computational grammar into an error detection system. I will then discuss the creation of a new treebank used to train parse-ranking models to help symbolic parsers decide the most likely correction for a given error. And I will conclude by describing the development of two web-based applications exploiting a mature symbolic parser to provide immediate corrective feedback for a large number of common errors.
This thesis presents multiple sets of positive results. I have not only substantially increased ZHONG's coverage, but I have also successfully implemented dozens of checks to detect common grammatical mistakes made by learners of Mandarin Chinese. Using the new parse-ranking models, I was also able to improve the precision of error detection in both English and Mandarin Chinese by between 15% and 20%. Finally, a blended learning experiment involving more than 1,800 NTU students has shown the success of an application developed specifically to help improve students' writing.
All developed systems, as well as most of the data collected and tagged during this thesis, are released under open-source licenses. |
---|