Term recognition from electronic medical records of Singaporean hospital (1/2)

Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of...

Full description

Saved in:

Bibliographic Details
Main Author:	Muhammad Hafiz Mohamed Hassan
Other Authors:	School of Computer Engineering
Format:	Final Year Project
Language:	English
Published:	2014
Subjects:	DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability
Online Access:	http://hdl.handle.net/10356/59051
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-59051
record_format	dspace
spelling	sg-ntu-dr.10356-590512023-03-03T20:39:16Z Term recognition from electronic medical records of Singaporean hospital (1/2) Muhammad Hafiz Mohamed Hassan School of Computer Engineering Kim Jung Jae DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of what is written. The challenge also lies with trying to tie down the medical terms used by the experience doctors in Singapore especially, with the standard terms that are used by other doctors, in different parts of the world. Thus a need was identified to help bridge manually written records by local doctors and ready available medical systems that conforms to other country standard; this project aims to develop a system that helps to be this bridge. The main objective of the project is to develop a term recognition system that can identify the content of what is written by the doctor. The author overall scope was to develop text categorization system to classify the words used in the EMR records by the machine learning approach. The author used Hierarchical TRIE structure to analyse the records, Manual segmentation was done on the records by Sentence disambiguation technique. Feature extraction by the n-grams approach was chosen and feature generation was developed. This was done by thorough analysis of the records and rigorous testing across different designs; each design was done to test different feature/features. A program was created and algorithms generated to classify the words according to the design specification in ARFF, LIBSVM and SVM light format. The result of the classifier was compiled and feature weighting was done. 3 main machine learning tools were used: WEKA, LIBSVM and SVM light. 3 different classifiers were used in WEKA: SMO, Naïve Bayes and J48 decision tree. From the Feature Performance result, it was found that increasing from a tri-gram output to a hex-gram output, introducing other features especially in shape feature form does help to improve its performance. Reducing the shape feature attribute by combining attribute features or omitting a shape feature attribute from the design does not have a big impact on the result. Having shape feature attributes for tri gram previous and next word does help to improve the performance but performance level does not increase when this is done to the hex–gram. Hence conclude that having more attributes for increase in n-value does not improve the performance level. Version 5b was identified as the best feature design. The design was used to re-evaluate test data, and automatic segmentation was done on the original data using the output results of the different classifiers. Comparison was done between the segmentation results of the different classifiers and a segmentation done by the author. Segmentation shows that the classifier with the best performance does not translate to most accurate segmentation result. Naïve B ayes give the most similar result but it has the lowest performance value. Results also show that there’s a lot to be improved in terms of feature design, use of rule base method can be considered in the future to compare with the machine learning approach that is done. More machine learning classifiers could be used and in depth analysis of the segmentation can be done Bachelor of Engineering (Computer Engineering) 2014-04-22T01:59:07Z 2014-04-22T01:59:07Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/59051 en Nanyang Technological University 87 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability
spellingShingle	DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability Muhammad Hafiz Mohamed Hassan Term recognition from electronic medical records of Singaporean hospital (1/2)
description	Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of what is written. The challenge also lies with trying to tie down the medical terms used by the experience doctors in Singapore especially, with the standard terms that are used by other doctors, in different parts of the world. Thus a need was identified to help bridge manually written records by local doctors and ready available medical systems that conforms to other country standard; this project aims to develop a system that helps to be this bridge. The main objective of the project is to develop a term recognition system that can identify the content of what is written by the doctor. The author overall scope was to develop text categorization system to classify the words used in the EMR records by the machine learning approach. The author used Hierarchical TRIE structure to analyse the records, Manual segmentation was done on the records by Sentence disambiguation technique. Feature extraction by the n-grams approach was chosen and feature generation was developed. This was done by thorough analysis of the records and rigorous testing across different designs; each design was done to test different feature/features. A program was created and algorithms generated to classify the words according to the design specification in ARFF, LIBSVM and SVM light format. The result of the classifier was compiled and feature weighting was done. 3 main machine learning tools were used: WEKA, LIBSVM and SVM light. 3 different classifiers were used in WEKA: SMO, Naïve Bayes and J48 decision tree. From the Feature Performance result, it was found that increasing from a tri-gram output to a hex-gram output, introducing other features especially in shape feature form does help to improve its performance. Reducing the shape feature attribute by combining attribute features or omitting a shape feature attribute from the design does not have a big impact on the result. Having shape feature attributes for tri gram previous and next word does help to improve the performance but performance level does not increase when this is done to the hex–gram. Hence conclude that having more attributes for increase in n-value does not improve the performance level. Version 5b was identified as the best feature design. The design was used to re-evaluate test data, and automatic segmentation was done on the original data using the output results of the different classifiers. Comparison was done between the segmentation results of the different classifiers and a segmentation done by the author. Segmentation shows that the classifier with the best performance does not translate to most accurate segmentation result. Naïve B ayes give the most similar result but it has the lowest performance value. Results also show that there’s a lot to be improved in terms of feature design, use of rule base method can be considered in the future to compare with the machine learning approach that is done. More machine learning classifiers could be used and in depth analysis of the segmentation can be done
author2	School of Computer Engineering
author_facet	School of Computer Engineering Muhammad Hafiz Mohamed Hassan
format	Final Year Project
author	Muhammad Hafiz Mohamed Hassan
author_sort	Muhammad Hafiz Mohamed Hassan
title	Term recognition from electronic medical records of Singaporean hospital (1/2)
title_short	Term recognition from electronic medical records of Singaporean hospital (1/2)
title_full	Term recognition from electronic medical records of Singaporean hospital (1/2)
title_fullStr	Term recognition from electronic medical records of Singaporean hospital (1/2)
title_full_unstemmed	Term recognition from electronic medical records of Singaporean hospital (1/2)
title_sort	term recognition from electronic medical records of singaporean hospital (1/2)
publishDate	2014
url	http://hdl.handle.net/10356/59051
_version_	1759856214058991616

Term recognition from electronic medical records of Singaporean hospital (1/2)

Similar Items