Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures

Objectives: We aimed to automate routine extraction of clinically relevant unstructured information from uro-oncological histopathology reports by applying rule-based and machine learning (ML)/deep learning (DL) methods to develop an oncology focused natural language processing (NLP) algorithm. Meth...

Full description

Saved in:

Bibliographic Details
Main Authors:	Huang, Honghong, Lim, Fiona Xin Yi, Gu, Gary Tianyu, Han, Matthew Jiangchou, Fang, Andrew Hao Sen, Chia, Elian Hui San, Bei, Eileen Yen Tze, Tham, Sarah Zhuling, Ho, Henry Sun Sien, Yuen, John Shyi Peng, Sun, Aixin, Lim, Jay Kheng Sit
Other Authors:	School of Computer Science and Engineering
Format:	Article
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering Natural Language Processing Uro-Oncology Histology Reports
Online Access:	https://hdl.handle.net/10356/169860
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-169860
record_format	dspace
spelling	sg-ntu-dr.10356-1698602023-08-11T15:35:34Z Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures Huang, Honghong Lim, Fiona Xin Yi Gu, Gary Tianyu Han, Matthew Jiangchou Fang, Andrew Hao Sen Chia, Elian Hui San Bei, Eileen Yen Tze Tham, Sarah Zhuling Ho, Henry Sun Sien Yuen, John Shyi Peng Sun, Aixin Lim, Jay Kheng Sit School of Computer Science and Engineering Engineering::Computer science and engineering Natural Language Processing Uro-Oncology Histology Reports Objectives: We aimed to automate routine extraction of clinically relevant unstructured information from uro-oncological histopathology reports by applying rule-based and machine learning (ML)/deep learning (DL) methods to develop an oncology focused natural language processing (NLP) algorithm. Methods: Our algorithm employs a combination of a rule-based approach and support vector machines/neural networks (BioBert/Clinical BERT), and is optimised for accuracy. We randomly extracted 5772 uro-oncological histology reports from 2008 to 2018 from electronic health records (EHRs) and split the data into training and validation datasets in an 80:20 ratio. The training dataset was annotated by medical professionals and reviewed by cancer registrars. The validation dataset was annotated by cancer registrars and defined as the gold standard with which the algorithm outcomes were compared. The accuracy of NLP-parsed data was matched against these human annotation results. We defined an accuracy rate of >95% as “acceptable” by professional human extraction, as per our cancer registry definition. Results: There were 11 extraction variables in 268 free-text reports. We achieved an accuracy rate of between 61.2% and 99.0% using our algorithm. Of the 11 data fields, a total of 8 data fields met the acceptable accuracy standard, while another 3 data fields had an accuracy rate between 61.2% and 89.7%. Noticeably, the rule-based approach was shown to be more effective and robust in extracting variables of interest. On the other hand, ML/DL models had poorer predictive performances due to highly imbalanced data distribution and variable writing styles between different reports and data used for domain-specific pre-trained models. Conclusion: We designed an NLP algorithm that can automate clinical information extraction accurately from histopathology reports with an overall average micro accuracy of 93.3%. Published version 2023-08-08T05:05:59Z 2023-08-08T05:05:59Z 2023 Journal Article Huang, H., Lim, F. X. Y., Gu, G. T., Han, M. J., Fang, A. H. S., Chia, E. H. S., Bei, E. Y. T., Tham, S. Z., Ho, H. S. S., Yuen, J. S. P., Sun, A. & Lim, J. K. S. (2023). Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures. Heliyon, 9(4), e14793-. https://dx.doi.org/10.1016/j.heliyon.2023.e14793 2405-8440 https://hdl.handle.net/10356/169860 10.1016/j.heliyon.2023.e14793 37025805 2-s2.0-85151252120 4 9 e14793 en Heliyon © 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Natural Language Processing Uro-Oncology Histology Reports
spellingShingle	Engineering::Computer science and engineering Natural Language Processing Uro-Oncology Histology Reports Huang, Honghong Lim, Fiona Xin Yi Gu, Gary Tianyu Han, Matthew Jiangchou Fang, Andrew Hao Sen Chia, Elian Hui San Bei, Eileen Yen Tze Tham, Sarah Zhuling Ho, Henry Sun Sien Yuen, John Shyi Peng Sun, Aixin Lim, Jay Kheng Sit Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures
description	Objectives: We aimed to automate routine extraction of clinically relevant unstructured information from uro-oncological histopathology reports by applying rule-based and machine learning (ML)/deep learning (DL) methods to develop an oncology focused natural language processing (NLP) algorithm. Methods: Our algorithm employs a combination of a rule-based approach and support vector machines/neural networks (BioBert/Clinical BERT), and is optimised for accuracy. We randomly extracted 5772 uro-oncological histology reports from 2008 to 2018 from electronic health records (EHRs) and split the data into training and validation datasets in an 80:20 ratio. The training dataset was annotated by medical professionals and reviewed by cancer registrars. The validation dataset was annotated by cancer registrars and defined as the gold standard with which the algorithm outcomes were compared. The accuracy of NLP-parsed data was matched against these human annotation results. We defined an accuracy rate of >95% as “acceptable” by professional human extraction, as per our cancer registry definition. Results: There were 11 extraction variables in 268 free-text reports. We achieved an accuracy rate of between 61.2% and 99.0% using our algorithm. Of the 11 data fields, a total of 8 data fields met the acceptable accuracy standard, while another 3 data fields had an accuracy rate between 61.2% and 89.7%. Noticeably, the rule-based approach was shown to be more effective and robust in extracting variables of interest. On the other hand, ML/DL models had poorer predictive performances due to highly imbalanced data distribution and variable writing styles between different reports and data used for domain-specific pre-trained models. Conclusion: We designed an NLP algorithm that can automate clinical information extraction accurately from histopathology reports with an overall average micro accuracy of 93.3%.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Huang, Honghong Lim, Fiona Xin Yi Gu, Gary Tianyu Han, Matthew Jiangchou Fang, Andrew Hao Sen Chia, Elian Hui San Bei, Eileen Yen Tze Tham, Sarah Zhuling Ho, Henry Sun Sien Yuen, John Shyi Peng Sun, Aixin Lim, Jay Kheng Sit
format	Article
author	Huang, Honghong Lim, Fiona Xin Yi Gu, Gary Tianyu Han, Matthew Jiangchou Fang, Andrew Hao Sen Chia, Elian Hui San Bei, Eileen Yen Tze Tham, Sarah Zhuling Ho, Henry Sun Sien Yuen, John Shyi Peng Sun, Aixin Lim, Jay Kheng Sit
author_sort	Huang, Honghong
title	Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures
title_short	Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures
title_full	Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures
title_fullStr	Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures
title_full_unstemmed	Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures
title_sort	natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures
publishDate	2023
url	https://hdl.handle.net/10356/169860
_version_	1779156761700204544

Natural language processing in urology: automated extraction of clinical information from histopathology reports of uro-oncology procedures

Similar Items