Building an HPSG Chinese grammar (Zhong)
This thesis describes the development of Zhong, a computational resource grammar for Chinese, in the framework of Head-driven Phrase Structure Grammar (HPSG: Pollard & Sag, 1994) using Minimal Recursion Semantics (Copestake et al., 2005). In order to increase the grammar’s coverage for practical...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/87331 http://hdl.handle.net/10220/48021 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This thesis describes the development of Zhong, a computational resource grammar for Chinese, in the framework of Head-driven Phrase Structure Grammar (HPSG: Pollard & Sag, 1994) using Minimal Recursion Semantics (Copestake et al., 2005). In order to increase the grammar’s coverage for practical applications, a corpus-driven approach was adopted to systematically expand its lexical and syntactic coverage. The lexicon was expanded through semi-automatic learning lexical entries from an annotated Chinese corpus. Various language phenomena commonly observed in corpora have been analyzed and modeled in the grammar, especially those involving the particle 的DE. The entire grammar and associated tools are available under an open-source license.
A treebank with 798 sentences has been built with the parse trees from the
grammar’s output. With appropriate trees manually selected from the parses, the treebank was used as a gold standard to train a statistical model which can be used to rank the grammar’s output parse trees, both to improve its performance in applications and to be helpful to grammar engineers during development and debugging.
To evaluate the grammar’s suitability to support applications like grammar feedback systems for second language learners, a small extension of the grammar is also built with MALrules and MAL-types to enable the parsing of sentences containing grammatical errors and detecting the specific errors. The information provided by the grammar would thus allow the feedback system to identify the errors and give appropriate suggestions to the learner. |
---|