Abbreviation Detection in Vietnamese Clinical Texts

Abbreviations have been widely used in clinical notes because generating clinical notes often takes place under high pressure with lack of writing time and medical record simplification. Those abbreviations limit the clarity and understanding of the records and greatly affect all the computer -based...

Full description

Saved in:

Bibliographic Details
Main Authors:	Vo, Chau, Cao, Tru, Ho, Bao
Format:	Article
Language:	English
Published:	H. : ĐHQGHN 2019
Subjects:	Electronic medical record Clinical note Abbreviation identification Semi-supervised learning Self-training Random forest
Online Access:	http://repository.vnu.edu.vn/handle/VNU_123/64775 https://doi.org/10.25073/2588-1086/vnucsce.211
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Vietnam National University, Hanoi
Language:	English

id	oai:112.137.131.14:VNU_123-64775
record_format	dspace
spelling	oai:112.137.131.14:VNU_123-647752019-07-01T08:11:48Z Abbreviation Detection in Vietnamese Clinical Texts Vo, Chau Cao, Tru Ho, Bao Electronic medical record Clinical note Abbreviation identification Semi-supervised learning Self-training Random forest Abbreviations have been widely used in clinical notes because generating clinical notes often takes place under high pressure with lack of writing time and medical record simplification. Those abbreviations limit the clarity and understanding of the records and greatly affect all the computer -based data processing tasks. In this paper, we propose a solution to the abbreviation identification task on clinical notes in a practical context where a few clinical notes have been labeled while so many clinical notes need to be labeled. Our solution is defined with a semi-supervised learning approach that uses level-wise feature engineering to construct an abbreviation identifier, from using a small set of labeled clinical texts and exploiting a larger set of unlabeled clinical texts. A semi-supervised learning algorithm, Semi-RF, and its advanced adaptive version, Weighted Semi-RF, are proposed in the self-training framework using random forest models and Tri-training. Weighted Semi-RF is different from Semi-RF as equipped with a new weighting scheme via adaptation on the current labeled data set. The proposed semi-supervised learning algorithms are practical with parameter-free settings to build an effective abbreviation identifier for identifying abbreviations automatically in clinical texts. Their effectiveness is confirmed with the better Precision and F-measure values from various experiments on real Vietnamese clinical notes. Compared to the existing solutions, our solution is novel for automatic abbreviation identification in clinical notes. Its results can lay the basis for determining the full form of each correctly identified abbreviation and then enhance the readability of the records 2019-07-01T08:11:48Z 2019-07-01T08:11:48Z 2018 Article Vo, C., et al. (2018). Abbreviation Detection in Vietnamese Clinical Texts. Journal of Science: Comp. Science & Com. Eng., Vol. 34, No. 2 (2018) 44-60. 2588-1086 http://repository.vnu.edu.vn/handle/VNU_123/64775 https://doi.org/10.25073/2588-1086/vnucsce.211 en Journal of Science: Comp. Science & Com. Eng.; application/pdf H. : ĐHQGHN
institution	Vietnam National University, Hanoi
building	VNU Library & Information Center
country	Vietnam
collection	VNU Digital Repository
language	English
topic	Electronic medical record Clinical note Abbreviation identification Semi-supervised learning Self-training Random forest
spellingShingle	Electronic medical record Clinical note Abbreviation identification Semi-supervised learning Self-training Random forest Vo, Chau Cao, Tru Ho, Bao Abbreviation Detection in Vietnamese Clinical Texts
description	Abbreviations have been widely used in clinical notes because generating clinical notes often takes place under high pressure with lack of writing time and medical record simplification. Those abbreviations limit the clarity and understanding of the records and greatly affect all the computer -based data processing tasks. In this paper, we propose a solution to the abbreviation identification task on clinical notes in a practical context where a few clinical notes have been labeled while so many clinical notes need to be labeled. Our solution is defined with a semi-supervised learning approach that uses level-wise feature engineering to construct an abbreviation identifier, from using a small set of labeled clinical texts and exploiting a larger set of unlabeled clinical texts. A semi-supervised learning algorithm, Semi-RF, and its advanced adaptive version, Weighted Semi-RF, are proposed in the self-training framework using random forest models and Tri-training. Weighted Semi-RF is different from Semi-RF as equipped with a new weighting scheme via adaptation on the current labeled data set. The proposed semi-supervised learning algorithms are practical with parameter-free settings to build an effective abbreviation identifier for identifying abbreviations automatically in clinical texts. Their effectiveness is confirmed with the better Precision and F-measure values from various experiments on real Vietnamese clinical notes. Compared to the existing solutions, our solution is novel for automatic abbreviation identification in clinical notes. Its results can lay the basis for determining the full form of each correctly identified abbreviation and then enhance the readability of the records
format	Article
author	Vo, Chau Cao, Tru Ho, Bao
author_facet	Vo, Chau Cao, Tru Ho, Bao
author_sort	Vo, Chau
title	Abbreviation Detection in Vietnamese Clinical Texts
title_short	Abbreviation Detection in Vietnamese Clinical Texts
title_full	Abbreviation Detection in Vietnamese Clinical Texts
title_fullStr	Abbreviation Detection in Vietnamese Clinical Texts
title_full_unstemmed	Abbreviation Detection in Vietnamese Clinical Texts
title_sort	abbreviation detection in vietnamese clinical texts
publisher	H. : ĐHQGHN
publishDate	2019
url	http://repository.vnu.edu.vn/handle/VNU_123/64775 https://doi.org/10.25073/2588-1086/vnucsce.211
_version_	1680967175439908864

Abbreviation Detection in Vietnamese Clinical Texts

Similar Items