Named entity recognition for information extraction

Named Entity Recognition (NER) for Information Extraction (IE) has grown in importance due to its capability to streamline processes such as administrative tasks by providing real-time feedback overview. This is achieved by conducting data mining to extract and provide useful information for each fe...

Full description

Saved in:

Bibliographic Details
Main Author:	Tan, Samantha Swee Yun
Other Authors:	Hui Siu Cheung
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Online Access:	https://hdl.handle.net/10356/163161
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-163161
record_format	dspace
spelling	sg-ntu-dr.10356-1631612022-11-28T23:38:16Z Named entity recognition for information extraction Tan, Samantha Swee Yun Hui Siu Cheung School of Computer Science and Engineering ASSCHUI@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Named Entity Recognition (NER) for Information Extraction (IE) has grown in importance due to its capability to streamline processes such as administrative tasks by providing real-time feedback overview. This is achieved by conducting data mining to extract and provide useful information for each feedback. This can help users and organisations to obtain a quick overview of how others perceive a particular product or service, enabling them to take further action to improve their businesses. Additionally, as Singapore is a well-known multicultural country, which consists of unique food, street and location names that may not always be in English, it is thus important for us to investigate NER on Singapore-based datasets. However, as the quality of NER is known to be affected by factors such as noise and data diversity, we propose the use of an NEM dictionary instead to increase the performance of the IE process. Hence, the aim of this project is to study and evaluate different NER models for building an NEM dictionary such as a Singapore Food Location NEM Dictionary. As a result of this project, three different NER models known as FLERT XLM-R, CL-KL and XLNet, have been evaluated on a benchmark dataset. Top performing models were then applied to two Singapore-based datasets to evaluate its effectiveness in extracting Singapore location names and addresses. Empirical results obtained from this project showed that LUKE with CL-KL, without external context retrieval was the best performing model that was able to meet our project objective. For future work, we recommend building a labelled Singapore dataset with BIO tagging scheme to improve the NER performance on Singapore-based datasets and we propose further works such as generating a more domain-specific NEM dictionary such as a Food NEM Dictionary as well as evaluating the use of NEM dictionary on real applications such as the NTU Food Hunter System. Bachelor of Engineering (Computer Science) 2022-11-28T23:38:16Z 2022-11-28T23:38:16Z 2022 Final Year Project (FYP) Tan, S. S. Y. (2022). Named entity recognition for information extraction. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/163161 https://hdl.handle.net/10356/163161 en SCSE21-0906 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Tan, Samantha Swee Yun Named entity recognition for information extraction
description	Named Entity Recognition (NER) for Information Extraction (IE) has grown in importance due to its capability to streamline processes such as administrative tasks by providing real-time feedback overview. This is achieved by conducting data mining to extract and provide useful information for each feedback. This can help users and organisations to obtain a quick overview of how others perceive a particular product or service, enabling them to take further action to improve their businesses. Additionally, as Singapore is a well-known multicultural country, which consists of unique food, street and location names that may not always be in English, it is thus important for us to investigate NER on Singapore-based datasets. However, as the quality of NER is known to be affected by factors such as noise and data diversity, we propose the use of an NEM dictionary instead to increase the performance of the IE process. Hence, the aim of this project is to study and evaluate different NER models for building an NEM dictionary such as a Singapore Food Location NEM Dictionary. As a result of this project, three different NER models known as FLERT XLM-R, CL-KL and XLNet, have been evaluated on a benchmark dataset. Top performing models were then applied to two Singapore-based datasets to evaluate its effectiveness in extracting Singapore location names and addresses. Empirical results obtained from this project showed that LUKE with CL-KL, without external context retrieval was the best performing model that was able to meet our project objective. For future work, we recommend building a labelled Singapore dataset with BIO tagging scheme to improve the NER performance on Singapore-based datasets and we propose further works such as generating a more domain-specific NEM dictionary such as a Food NEM Dictionary as well as evaluating the use of NEM dictionary on real applications such as the NTU Food Hunter System.
author2	Hui Siu Cheung
author_facet	Hui Siu Cheung Tan, Samantha Swee Yun
format	Final Year Project
author	Tan, Samantha Swee Yun
author_sort	Tan, Samantha Swee Yun
title	Named entity recognition for information extraction
title_short	Named entity recognition for information extraction
title_full	Named entity recognition for information extraction
title_fullStr	Named entity recognition for information extraction
title_full_unstemmed	Named entity recognition for information extraction
title_sort	named entity recognition for information extraction
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/163161
_version_	1751548496726458368

Named entity recognition for information extraction

Similar Items