BERT named entity recognition on emergency response system

Named Entity Recognition (NER) is a natural language processing task to identify pre-defined categories called entities in a given sequence. An existing Emergency Response System is a NER-based application developed to aid call operators by extracting key information from the caller and replacing th...

Full description

Saved in:
Bibliographic Details
Main Author: Chua, Clarita Wyn Kay
Other Authors: Chng Eng Siong
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/156612
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Named Entity Recognition (NER) is a natural language processing task to identify pre-defined categories called entities in a given sequence. An existing Emergency Response System is a NER-based application developed to aid call operators by extracting key information from the caller and replacing the need for manual insertion by call operators into the command control system. This paper proposes the improvement of the NER model in the Emergency Response System by including medical and covid-related entities through finetuning and training the different BERT variant models onto a COVID dataset and General Emergency Response dataset. To aid in the development and deployment of a variety of BERT-based models, the paper also introduces an automated NER pipeline with modules to prepare data and run the NER model. This paper also explores the benefits of data augmentation from PEGASUS with experiments conducted on augmented and non-augmented datasets to obtain the best baseline NER model for each respective dataset. From the experiments, we have found that roBERTa is the best baseline model for original datasets, but performed around 10% lower on augmented datasets. Additionally, DistilBERT and BERT NER models have shown significant improvement within a range of 3- 7% in their performance after data augmentation.