FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation

Chinese Grammatical Error Correction (CGEC) has been attracting growing attention from researchers recently. In spite of the fact that multiple CGEC datasets have been developed to support the research, these datasets lack the ability to provide a deep linguistic topology of grammar errors, which is...

Full description

Saved in:

Bibliographic Details
Main Authors:	DU, Hanyue, ZHAO, Yike, TIAN, Qingyuan, WANG, Jiani, WANG, Lei, LAN, Yunshi, LU, Xuesong
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Chinese Grammatical Error Correction Deep Learning Fine-grained Linguistic Annotation Asian Studies Databases and Information Systems East Asian Languages and Societies
Online Access:	https://ink.library.smu.edu.sg/sis_research/8463 https://ink.library.smu.edu.sg/context/sis_research/article/9466/viewcontent/FlaCGEC_av.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9466
record_format	dspace
spelling	sg-smu-ink.sis_research-94662024-01-04T09:42:43Z FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation DU, Hanyue ZHAO, Yike TIAN, Qingyuan WANG, Jiani WANG, Lei LAN, Yunshi LU, Xuesong Chinese Grammatical Error Correction (CGEC) has been attracting growing attention from researchers recently. In spite of the fact that multiple CGEC datasets have been developed to support the research, these datasets lack the ability to provide a deep linguistic topology of grammar errors, which is critical for interpreting and diagnosing CGEC approaches. To address this limitation, we introduce FlaCGEC, which is a new CGEC dataset featured with fine-grained linguistic annotation. Specifically, we collect raw corpus from the linguistic schema defined by Chinese language experts, conduct edits on sentences via rules, and refine generated samples manually, which results in 10k sentences with 78 instantiated grammar points and 3 types of edits. We evaluate various cutting-edge CGEC methods on the proposed FlaCGEC dataset and their unremarkable results indicate that this dataset is challenging in covering a large range of grammatical errors. In addition, we also treat FlaCGEC as a diagnostic dataset for testing generalization skills and conduct a thorough evaluation of existing CGEC models. 2023-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8463 info:doi/10.1145/3583780.3615119 https://ink.library.smu.edu.sg/context/sis_research/article/9466/viewcontent/FlaCGEC_av.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Chinese Grammatical Error Correction Deep Learning Fine-grained Linguistic Annotation Asian Studies Databases and Information Systems East Asian Languages and Societies
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Chinese Grammatical Error Correction Deep Learning Fine-grained Linguistic Annotation Asian Studies Databases and Information Systems East Asian Languages and Societies
spellingShingle	Chinese Grammatical Error Correction Deep Learning Fine-grained Linguistic Annotation Asian Studies Databases and Information Systems East Asian Languages and Societies DU, Hanyue ZHAO, Yike TIAN, Qingyuan WANG, Jiani WANG, Lei LAN, Yunshi LU, Xuesong FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation
description	Chinese Grammatical Error Correction (CGEC) has been attracting growing attention from researchers recently. In spite of the fact that multiple CGEC datasets have been developed to support the research, these datasets lack the ability to provide a deep linguistic topology of grammar errors, which is critical for interpreting and diagnosing CGEC approaches. To address this limitation, we introduce FlaCGEC, which is a new CGEC dataset featured with fine-grained linguistic annotation. Specifically, we collect raw corpus from the linguistic schema defined by Chinese language experts, conduct edits on sentences via rules, and refine generated samples manually, which results in 10k sentences with 78 instantiated grammar points and 3 types of edits. We evaluate various cutting-edge CGEC methods on the proposed FlaCGEC dataset and their unremarkable results indicate that this dataset is challenging in covering a large range of grammatical errors. In addition, we also treat FlaCGEC as a diagnostic dataset for testing generalization skills and conduct a thorough evaluation of existing CGEC models.
format	text
author	DU, Hanyue ZHAO, Yike TIAN, Qingyuan WANG, Jiani WANG, Lei LAN, Yunshi LU, Xuesong
author_facet	DU, Hanyue ZHAO, Yike TIAN, Qingyuan WANG, Jiani WANG, Lei LAN, Yunshi LU, Xuesong
author_sort	DU, Hanyue
title	FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation
title_short	FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation
title_full	FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation
title_fullStr	FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation
title_full_unstemmed	FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation
title_sort	flacgec: a chinese grammatical error correction dataset with fine-grained linguistic annotation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/8463 https://ink.library.smu.edu.sg/context/sis_research/article/9466/viewcontent/FlaCGEC_av.pdf
_version_	1787590774260498432

FlaCGEC: A Chinese grammatical error correction dataset with fine-grained linguistic annotation

Similar Items