Named entity recognition in the medical domain

This report presents a project that aims to develop Named Entity Recognition (NER) models for two datasets in the medical domain: emergency hotline data and N2C2 clinical notes. The project objectives include reviewing existing models and architectures, training on general datasets to find the best...

全面介紹

Saved in:

書目詳細資料
主要作者:	Kusalavan, Kirubhaharini
其他作者:	Chng Eng Siong
格式:	Final Year Project
語言:	English
出版:	Nanyang Technological University 2023
主題:	Engineering::Computer science and engineering::Data
在線閱讀:	https://hdl.handle.net/10356/165225
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Nanyang Technological University
語言:	English

id	sg-ntu-dr.10356-165225
record_format	dspace
spelling	sg-ntu-dr.10356-1652252023-03-24T15:40:57Z Named entity recognition in the medical domain Kusalavan, Kirubhaharini Chng Eng Siong School of Computer Science and Engineering ASESChng@ntu.edu.sg Engineering::Computer science and engineering::Data This report presents a project that aims to develop Named Entity Recognition (NER) models for two datasets in the medical domain: emergency hotline data and N2C2 clinical notes. The project objectives include reviewing existing models and architectures, training on general datasets to find the best model and architecture, and creating a pipeline to train and deploy NER models for different domains. The NER models will be used to auto-fill forms during emergencies by detecting entities in call transcripts and to identify essential entities from clinical notes. The paper also discusses the sampling method used to derive subsets from the datasets, the backend and frontend of the NER Flask application, and presents the results and discussions. For the GMB dataset, RoBERTa outperformed BERT by 0.19% and DistilBERT by 1.58%. RoBERTa and BERT showed similar results for the CoNLL-2003 dataset, with the latter scoring 0.02% higher and 0.85% better than DistilBERT. MedBERT was the best model for the N2C2 dataset, performing 1.71% better than BERT. However, the implementation of augmentation techniques for the GMB and N2C2 datasets did not yield significant improvements in the results of the NER models. Lastly, The emergency hotline dataset models showed similar results, with BioClinical BERT scoring the highest. These models can be deployed using the Flask application introduced in this report to receive useful outputs. Bachelor of Science in Data Science and Artificial Intelligence 2023-03-20T23:53:48Z 2023-03-20T23:53:48Z 2023 Final Year Project (FYP) Kusalavan, K. (2023). Named entity recognition in the medical domain. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/165225 https://hdl.handle.net/10356/165225 en SCSE22-0086 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Data
spellingShingle	Engineering::Computer science and engineering::Data Kusalavan, Kirubhaharini Named entity recognition in the medical domain
description	This report presents a project that aims to develop Named Entity Recognition (NER) models for two datasets in the medical domain: emergency hotline data and N2C2 clinical notes. The project objectives include reviewing existing models and architectures, training on general datasets to find the best model and architecture, and creating a pipeline to train and deploy NER models for different domains. The NER models will be used to auto-fill forms during emergencies by detecting entities in call transcripts and to identify essential entities from clinical notes. The paper also discusses the sampling method used to derive subsets from the datasets, the backend and frontend of the NER Flask application, and presents the results and discussions. For the GMB dataset, RoBERTa outperformed BERT by 0.19% and DistilBERT by 1.58%. RoBERTa and BERT showed similar results for the CoNLL-2003 dataset, with the latter scoring 0.02% higher and 0.85% better than DistilBERT. MedBERT was the best model for the N2C2 dataset, performing 1.71% better than BERT. However, the implementation of augmentation techniques for the GMB and N2C2 datasets did not yield significant improvements in the results of the NER models. Lastly, The emergency hotline dataset models showed similar results, with BioClinical BERT scoring the highest. These models can be deployed using the Flask application introduced in this report to receive useful outputs.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Kusalavan, Kirubhaharini
format	Final Year Project
author	Kusalavan, Kirubhaharini
author_sort	Kusalavan, Kirubhaharini
title	Named entity recognition in the medical domain
title_short	Named entity recognition in the medical domain
title_full	Named entity recognition in the medical domain
title_fullStr	Named entity recognition in the medical domain
title_full_unstemmed	Named entity recognition in the medical domain
title_sort	named entity recognition in the medical domain
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/165225
_version_	1761781182056366080

Named entity recognition in the medical domain

相似書籍