Low-resource name tagging learned with weakly labeled data

Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely...

Full description

Saved in:

Bibliographic Details
Main Authors:	CAO, Yixin, HU, Zikun, CHUA, Tat-Seng, LIU, Zhiyuan, JI, Heng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	Databases and Information Systems Graphics and Human Computer Interfaces
Online Access:	https://ink.library.smu.edu.sg/sis_research/7457 https://ink.library.smu.edu.sg/context/sis_research/article/8460/viewcontent/D19_1025.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8460
record_format	dspace
spelling	sg-smu-ink.sis_research-84602022-10-20T07:18:26Z Low-resource name tagging learned with weakly labeled data CAO, Yixin HU, Zikun CHUA, Tat-Seng LIU, Zhiyuan JI, Heng Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6% and 7.8% F1 gains on average) as well as efficiency. 2019-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7457 info:doi/10.18653/v1/D19-1025 https://ink.library.smu.edu.sg/context/sis_research/article/8460/viewcontent/D19_1025.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Graphics and Human Computer Interfaces
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Databases and Information Systems Graphics and Human Computer Interfaces
spellingShingle	Databases and Information Systems Graphics and Human Computer Interfaces CAO, Yixin HU, Zikun CHUA, Tat-Seng LIU, Zhiyuan JI, Heng Low-resource name tagging learned with weakly labeled data
description	Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6% and 7.8% F1 gains on average) as well as efficiency.
format	text
author	CAO, Yixin HU, Zikun CHUA, Tat-Seng LIU, Zhiyuan JI, Heng
author_facet	CAO, Yixin HU, Zikun CHUA, Tat-Seng LIU, Zhiyuan JI, Heng
author_sort	CAO, Yixin
title	Low-resource name tagging learned with weakly labeled data
title_short	Low-resource name tagging learned with weakly labeled data
title_full	Low-resource name tagging learned with weakly labeled data
title_fullStr	Low-resource name tagging learned with weakly labeled data
title_full_unstemmed	Low-resource name tagging learned with weakly labeled data
title_sort	low-resource name tagging learned with weakly labeled data
publisher	Institutional Knowledge at Singapore Management University
publishDate	2019
url	https://ink.library.smu.edu.sg/sis_research/7457 https://ink.library.smu.edu.sg/context/sis_research/article/8460/viewcontent/D19_1025.pdf
_version_	1770576341958131712

Low-resource name tagging learned with weakly labeled data

Similar Items