Using rich models of language in grammatical error detection

In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable...

Full description

Saved in:
Bibliographic Details
Main Author: Da Costa, Luis Morgado
Other Authors: Annabel Chen Shen-Hsing
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155214
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-155214
record_format dspace
spelling sg-ntu-dr.10356-1552142023-03-05T16:33:49Z Using rich models of language in grammatical error detection Da Costa, Luis Morgado Annabel Chen Shen-Hsing Francis Bond Interdisciplinary Graduate School (IGS) Global Asia fcbond@ntu.edu.sg, AnnabelChen@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Humanities::Linguistics::Syntax In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable approach to build educational applications. During the various chapters of this thesis, I will guide the reader through the entire process of creating a successful educational application that has benefited thousands of NTU students. To this end, I will start by describing the creation of two new learner corpora, one for English and one for Mandarin Chinese, through which I collected first-hand data about common errors NTU students make in these two languages. I will follow with a discussion of my contributions to ZHONG, an open source computational grammar of Mandarin Chinese using a theoretical framework known as Head-Driven Phrase Structure Grammar, with special emphasis on the design of special rules capable of transforming a computational grammar into an error detection system. I will then discuss the creation of a new treebank used to train parse-ranking models to help symbolic parsers decide the most likely correction for a given error. And I will conclude by describing the development of two web-based applications exploiting a mature symbolic parser to provide immediate corrective feedback for a large number of common errors. This thesis presents multiple sets of positive results. I have not only substantially increased ZHONG's coverage, but I have also successfully implemented dozens of checks to detect common grammatical mistakes made by learners of Mandarin Chinese. Using the new parse-ranking models, I was also able to improve the precision of error detection in both English and Mandarin Chinese by between 15% and 20%. Finally, a blended learning experiment involving more than 1,800 NTU students has shown the success of an application developed specifically to help improve students' writing. All developed systems, as well as most of the data collected and tagged during this thesis, are released under open-source licenses. Doctor of Philosophy 2022-02-11T03:01:30Z 2022-02-11T03:01:30Z 2021 Thesis-Doctor of Philosophy Da Costa, L. M. (2021). Using rich models of language in grammatical error detection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155214 https://hdl.handle.net/10356/155214 10.32657/10356/155214 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computer applications
Humanities::Linguistics::Syntax
spellingShingle Engineering::Computer science and engineering::Computer applications
Humanities::Linguistics::Syntax
Da Costa, Luis Morgado
Using rich models of language in grammatical error detection
description In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable approach to build educational applications. During the various chapters of this thesis, I will guide the reader through the entire process of creating a successful educational application that has benefited thousands of NTU students. To this end, I will start by describing the creation of two new learner corpora, one for English and one for Mandarin Chinese, through which I collected first-hand data about common errors NTU students make in these two languages. I will follow with a discussion of my contributions to ZHONG, an open source computational grammar of Mandarin Chinese using a theoretical framework known as Head-Driven Phrase Structure Grammar, with special emphasis on the design of special rules capable of transforming a computational grammar into an error detection system. I will then discuss the creation of a new treebank used to train parse-ranking models to help symbolic parsers decide the most likely correction for a given error. And I will conclude by describing the development of two web-based applications exploiting a mature symbolic parser to provide immediate corrective feedback for a large number of common errors. This thesis presents multiple sets of positive results. I have not only substantially increased ZHONG's coverage, but I have also successfully implemented dozens of checks to detect common grammatical mistakes made by learners of Mandarin Chinese. Using the new parse-ranking models, I was also able to improve the precision of error detection in both English and Mandarin Chinese by between 15% and 20%. Finally, a blended learning experiment involving more than 1,800 NTU students has shown the success of an application developed specifically to help improve students' writing. All developed systems, as well as most of the data collected and tagged during this thesis, are released under open-source licenses.
author2 Annabel Chen Shen-Hsing
author_facet Annabel Chen Shen-Hsing
Da Costa, Luis Morgado
format Thesis-Doctor of Philosophy
author Da Costa, Luis Morgado
author_sort Da Costa, Luis Morgado
title Using rich models of language in grammatical error detection
title_short Using rich models of language in grammatical error detection
title_full Using rich models of language in grammatical error detection
title_fullStr Using rich models of language in grammatical error detection
title_full_unstemmed Using rich models of language in grammatical error detection
title_sort using rich models of language in grammatical error detection
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/155214
_version_ 1759854634695917568