Low-resource name tagging learned with weakly labeled data

Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely...

Full description

Saved in:
Bibliographic Details
Main Authors: CAO, Yixin, HU, Zikun, CHUA, Tat-Seng, LIU, Zhiyuan, JI, Heng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7457
https://ink.library.smu.edu.sg/context/sis_research/article/8460/viewcontent/D19_1025.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-8460
record_format dspace
spelling sg-smu-ink.sis_research-84602022-10-20T07:18:26Z Low-resource name tagging learned with weakly labeled data CAO, Yixin HU, Zikun CHUA, Tat-Seng LIU, Zhiyuan JI, Heng Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6% and 7.8% F1 gains on average) as well as efficiency. 2019-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7457 info:doi/10.18653/v1/D19-1025 https://ink.library.smu.edu.sg/context/sis_research/article/8460/viewcontent/D19_1025.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Databases and Information Systems Graphics and Human Computer Interfaces
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Databases and Information Systems
Graphics and Human Computer Interfaces
spellingShingle Databases and Information Systems
Graphics and Human Computer Interfaces
CAO, Yixin
HU, Zikun
CHUA, Tat-Seng
LIU, Zhiyuan
JI, Heng
Low-resource name tagging learned with weakly labeled data
description Name tagging in low-resource languages or domains suffers from inadequate training data. Existing work heavily relies on additional information, while leaving those noisy annotations unexplored that extensively exist on the web. In this paper, we propose a novel neural model for name tagging solely based on weakly labeled (WL) data, so that it can be applied in any low-resource settings. To take the best advantage of all WL sentences, we split them into high-quality and noisy portions for two modules, respectively: (1) a classification module focusing on the large portion of noisy data can efficiently and robustly pretrain the tag classifier by capturing textual context semantics; and (2) a costly sequence labeling module focusing on high-quality data utilizes Partial-CRFs with non-entity sampling to achieve global optimum. Two modules are combined via shared parameters. Extensive experiments involving five low-resource languages and fine-grained food domain demonstrate our superior performance (6% and 7.8% F1 gains on average) as well as efficiency.
format text
author CAO, Yixin
HU, Zikun
CHUA, Tat-Seng
LIU, Zhiyuan
JI, Heng
author_facet CAO, Yixin
HU, Zikun
CHUA, Tat-Seng
LIU, Zhiyuan
JI, Heng
author_sort CAO, Yixin
title Low-resource name tagging learned with weakly labeled data
title_short Low-resource name tagging learned with weakly labeled data
title_full Low-resource name tagging learned with weakly labeled data
title_fullStr Low-resource name tagging learned with weakly labeled data
title_full_unstemmed Low-resource name tagging learned with weakly labeled data
title_sort low-resource name tagging learned with weakly labeled data
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/7457
https://ink.library.smu.edu.sg/context/sis_research/article/8460/viewcontent/D19_1025.pdf
_version_ 1770576341958131712