Using rich models of language in grammatical error detection

In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable...

Full description

Saved in:

Bibliographic Details
Main Author:	Da Costa, Luis Morgado
Other Authors:	Annabel Chen Shen-Hsing
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computer applications Humanities::Linguistics::Syntax
Online Access:	https://hdl.handle.net/10356/155214
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-155214
record_format	dspace
spelling	sg-ntu-dr.10356-1552142023-03-05T16:33:49Z Using rich models of language in grammatical error detection Da Costa, Luis Morgado Annabel Chen Shen-Hsing Francis Bond Interdisciplinary Graduate School (IGS) Global Asia fcbond@ntu.edu.sg, AnnabelChen@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Humanities::Linguistics::Syntax In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable approach to build educational applications. During the various chapters of this thesis, I will guide the reader through the entire process of creating a successful educational application that has benefited thousands of NTU students. To this end, I will start by describing the creation of two new learner corpora, one for English and one for Mandarin Chinese, through which I collected first-hand data about common errors NTU students make in these two languages. I will follow with a discussion of my contributions to ZHONG, an open source computational grammar of Mandarin Chinese using a theoretical framework known as Head-Driven Phrase Structure Grammar, with special emphasis on the design of special rules capable of transforming a computational grammar into an error detection system. I will then discuss the creation of a new treebank used to train parse-ranking models to help symbolic parsers decide the most likely correction for a given error. And I will conclude by describing the development of two web-based applications exploiting a mature symbolic parser to provide immediate corrective feedback for a large number of common errors. This thesis presents multiple sets of positive results. I have not only substantially increased ZHONG's coverage, but I have also successfully implemented dozens of checks to detect common grammatical mistakes made by learners of Mandarin Chinese. Using the new parse-ranking models, I was also able to improve the precision of error detection in both English and Mandarin Chinese by between 15% and 20%. Finally, a blended learning experiment involving more than 1,800 NTU students has shown the success of an application developed specifically to help improve students' writing. All developed systems, as well as most of the data collected and tagged during this thesis, are released under open-source licenses. Doctor of Philosophy 2022-02-11T03:01:30Z 2022-02-11T03:01:30Z 2021 Thesis-Doctor of Philosophy Da Costa, L. M. (2021). Using rich models of language in grammatical error detection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155214 https://hdl.handle.net/10356/155214 10.32657/10356/155214 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computer applications Humanities::Linguistics::Syntax
spellingShingle	Engineering::Computer science and engineering::Computer applications Humanities::Linguistics::Syntax Da Costa, Luis Morgado Using rich models of language in grammatical error detection
description	In this thesis, I show the advantages of using symbolic parsers for Grammatical Error Detection and Correction. In particular, I work with computational grammars for English and Mandarin Chinese to demonstrate how linguistically motivated research using symbolic parsers is still an extremely viable approach to build educational applications. During the various chapters of this thesis, I will guide the reader through the entire process of creating a successful educational application that has benefited thousands of NTU students. To this end, I will start by describing the creation of two new learner corpora, one for English and one for Mandarin Chinese, through which I collected first-hand data about common errors NTU students make in these two languages. I will follow with a discussion of my contributions to ZHONG, an open source computational grammar of Mandarin Chinese using a theoretical framework known as Head-Driven Phrase Structure Grammar, with special emphasis on the design of special rules capable of transforming a computational grammar into an error detection system. I will then discuss the creation of a new treebank used to train parse-ranking models to help symbolic parsers decide the most likely correction for a given error. And I will conclude by describing the development of two web-based applications exploiting a mature symbolic parser to provide immediate corrective feedback for a large number of common errors. This thesis presents multiple sets of positive results. I have not only substantially increased ZHONG's coverage, but I have also successfully implemented dozens of checks to detect common grammatical mistakes made by learners of Mandarin Chinese. Using the new parse-ranking models, I was also able to improve the precision of error detection in both English and Mandarin Chinese by between 15% and 20%. Finally, a blended learning experiment involving more than 1,800 NTU students has shown the success of an application developed specifically to help improve students' writing. All developed systems, as well as most of the data collected and tagged during this thesis, are released under open-source licenses.
author2	Annabel Chen Shen-Hsing
author_facet	Annabel Chen Shen-Hsing Da Costa, Luis Morgado
format	Thesis-Doctor of Philosophy
author	Da Costa, Luis Morgado
author_sort	Da Costa, Luis Morgado
title	Using rich models of language in grammatical error detection
title_short	Using rich models of language in grammatical error detection
title_full	Using rich models of language in grammatical error detection
title_fullStr	Using rich models of language in grammatical error detection
title_full_unstemmed	Using rich models of language in grammatical error detection
title_sort	using rich models of language in grammatical error detection
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/155214
_version_	1759854634695917568

Using rich models of language in grammatical error detection

Similar Items