Deep learning-based text augmentation for named entity recognition
This thesis is focused on the development of an effective text augmentation method for Named Entity Recognition (NER) in the low-resource setting. NER, an important sequence labeling task in Natural Language Processing, is used to identify predefined entities in text. NER datasets tend to be smal...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Research |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/171105 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-171105 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1711052023-11-02T02:20:48Z Deep learning-based text augmentation for named entity recognition Surana, Tanmay Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Computer science and engineering Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Information systems::Information storage and retrieval This thesis is focused on the development of an effective text augmentation method for Named Entity Recognition (NER) in the low-resource setting. NER, an important sequence labeling task in Natural Language Processing, is used to identify predefined entities in text. NER datasets tend to be small, making the creation of additional text via text augmentation a plausible solution. Existing NER text augmentation works suffer from label corruption and lack of context diversity. To address these limitations, this thesis proposes Contextual and Semantic Structure-based Interpolation (CASSI) - a structure-based text augmentation scheme that produces a combination of two semantically similar sentences. This is done by producing candidate augmentations via replacements of sub-trees of their dependency parse trees containing subjects, objects, or complements. The final augmentation is selected by filtering candidates through Language Model scoring and a metric that uses Jaccard Similarity between the original pair and the candidates to improve specificity. Experiments show that CASSI consistently outperforms existing methods on multiple resource levels and multiple languages. When compared to the best-performing baseline, it shows an average relative improvement in the Micro-F1 of 4.28% to 25.97% on subsets of CoNLL 2002/03, and 1.56% across three noisy text datasets. Master of Engineering 2023-10-16T02:23:27Z 2023-10-16T02:23:27Z 2023 Thesis-Master by Research Surana, T. (2023). Deep learning-based text augmentation for named entity recognition. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/171105 https://hdl.handle.net/10356/171105 10.32657/10356/171105 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Computer science and engineering Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Information systems::Information storage and retrieval |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Document and text processing Engineering::Computer science and engineering Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Information systems::Information storage and retrieval Surana, Tanmay Deep learning-based text augmentation for named entity recognition |
description |
This thesis is focused on the development of an effective text augmentation method for Named Entity Recognition (NER) in the low-resource setting.
NER, an important sequence labeling task in Natural Language Processing, is used to identify predefined entities in text. NER datasets tend to be small, making the creation of additional text via text augmentation a plausible solution.
Existing NER text augmentation works suffer from label corruption and lack of context diversity. To address these limitations, this thesis proposes Contextual and Semantic Structure-based Interpolation (CASSI) - a structure-based text augmentation scheme that produces a combination of two semantically similar sentences. This is done by producing candidate augmentations via replacements of sub-trees of their dependency parse trees containing subjects, objects, or complements. The final augmentation is selected by filtering candidates through Language Model scoring and a metric that uses Jaccard Similarity between the original pair and the candidates to improve specificity.
Experiments show that CASSI consistently outperforms existing methods on multiple resource levels and multiple languages. When compared to the best-performing baseline, it shows an average relative improvement in the Micro-F1 of 4.28% to 25.97% on subsets of CoNLL 2002/03, and 1.56% across three noisy text datasets. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Surana, Tanmay |
format |
Thesis-Master by Research |
author |
Surana, Tanmay |
author_sort |
Surana, Tanmay |
title |
Deep learning-based text augmentation for named entity recognition |
title_short |
Deep learning-based text augmentation for named entity recognition |
title_full |
Deep learning-based text augmentation for named entity recognition |
title_fullStr |
Deep learning-based text augmentation for named entity recognition |
title_full_unstemmed |
Deep learning-based text augmentation for named entity recognition |
title_sort |
deep learning-based text augmentation for named entity recognition |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/171105 |
_version_ |
1781793875131629568 |