Term recognition from electronic medical records of Singaporean hospital (1/2)

Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of...

Full description

Saved in:
Bibliographic Details
Main Author: Muhammad Hafiz Mohamed Hassan
Other Authors: School of Computer Engineering
Format: Final Year Project
Language:English
Published: 2014
Subjects:
Online Access:http://hdl.handle.net/10356/59051
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-59051
record_format dspace
spelling sg-ntu-dr.10356-590512023-03-03T20:39:16Z Term recognition from electronic medical records of Singaporean hospital (1/2) Muhammad Hafiz Mohamed Hassan School of Computer Engineering Kim Jung Jae DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of what is written. The challenge also lies with trying to tie down the medical terms used by the experience doctors in Singapore especially, with the standard terms that are used by other doctors, in different parts of the world. Thus a need was identified to help bridge manually written records by local doctors and ready available medical systems that conforms to other country standard; this project aims to develop a system that helps to be this bridge. The main objective of the project is to develop a term recognition system that can identify the content of what is written by the doctor. The author overall scope was to develop text categorization system to classify the words used in the EMR records by the machine learning approach. The author used Hierarchical TRIE structure to analyse the records, Manual segmentation was done on the records by Sentence disambiguation technique. Feature extraction by the n-grams approach was chosen and feature generation was developed. This was done by thorough analysis of the records and rigorous testing across different designs; each design was done to test different feature/features. A program was created and algorithms generated to classify the words according to the design specification in ARFF, LIBSVM and SVM light format. The result of the classifier was compiled and feature weighting was done. 3 main machine learning tools were used: WEKA, LIBSVM and SVM light. 3 different classifiers were used in WEKA: SMO, Naïve Bayes and J48 decision tree. From the Feature Performance result, it was found that increasing from a tri-gram output to a hex-gram output, introducing other features especially in shape feature form does help to improve its performance. Reducing the shape feature attribute by combining attribute features or omitting a shape feature attribute from the design does not have a big impact on the result. Having shape feature attributes for tri gram previous and next word does help to improve the performance but performance level does not increase when this is done to the hex–gram. Hence conclude that having more attributes for increase in n-value does not improve the performance level. Version 5b was identified as the best feature design. The design was used to re-evaluate test data, and automatic segmentation was done on the original data using the output results of the different classifiers. Comparison was done between the segmentation results of the different classifiers and a segmentation done by the author. Segmentation shows that the classifier with the best performance does not translate to most accurate segmentation result. Naïve B ayes give the most similar result but it has the lowest performance value. Results also show that there’s a lot to be improved in terms of feature design, use of rule base method can be considered in the future to compare with the machine learning approach that is done. More machine learning classifiers could be used and in depth analysis of the segmentation can be done Bachelor of Engineering (Computer Engineering) 2014-04-22T01:59:07Z 2014-04-22T01:59:07Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/59051 en Nanyang Technological University 87 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability
spellingShingle DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability
Muhammad Hafiz Mohamed Hassan
Term recognition from electronic medical records of Singaporean hospital (1/2)
description Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of what is written. The challenge also lies with trying to tie down the medical terms used by the experience doctors in Singapore especially, with the standard terms that are used by other doctors, in different parts of the world. Thus a need was identified to help bridge manually written records by local doctors and ready available medical systems that conforms to other country standard; this project aims to develop a system that helps to be this bridge. The main objective of the project is to develop a term recognition system that can identify the content of what is written by the doctor. The author overall scope was to develop text categorization system to classify the words used in the EMR records by the machine learning approach. The author used Hierarchical TRIE structure to analyse the records, Manual segmentation was done on the records by Sentence disambiguation technique. Feature extraction by the n-grams approach was chosen and feature generation was developed. This was done by thorough analysis of the records and rigorous testing across different designs; each design was done to test different feature/features. A program was created and algorithms generated to classify the words according to the design specification in ARFF, LIBSVM and SVM light format. The result of the classifier was compiled and feature weighting was done. 3 main machine learning tools were used: WEKA, LIBSVM and SVM light. 3 different classifiers were used in WEKA: SMO, Naïve Bayes and J48 decision tree. From the Feature Performance result, it was found that increasing from a tri-gram output to a hex-gram output, introducing other features especially in shape feature form does help to improve its performance. Reducing the shape feature attribute by combining attribute features or omitting a shape feature attribute from the design does not have a big impact on the result. Having shape feature attributes for tri gram previous and next word does help to improve the performance but performance level does not increase when this is done to the hex–gram. Hence conclude that having more attributes for increase in n-value does not improve the performance level. Version 5b was identified as the best feature design. The design was used to re-evaluate test data, and automatic segmentation was done on the original data using the output results of the different classifiers. Comparison was done between the segmentation results of the different classifiers and a segmentation done by the author. Segmentation shows that the classifier with the best performance does not translate to most accurate segmentation result. Naïve B ayes give the most similar result but it has the lowest performance value. Results also show that there’s a lot to be improved in terms of feature design, use of rule base method can be considered in the future to compare with the machine learning approach that is done. More machine learning classifiers could be used and in depth analysis of the segmentation can be done
author2 School of Computer Engineering
author_facet School of Computer Engineering
Muhammad Hafiz Mohamed Hassan
format Final Year Project
author Muhammad Hafiz Mohamed Hassan
author_sort Muhammad Hafiz Mohamed Hassan
title Term recognition from electronic medical records of Singaporean hospital (1/2)
title_short Term recognition from electronic medical records of Singaporean hospital (1/2)
title_full Term recognition from electronic medical records of Singaporean hospital (1/2)
title_fullStr Term recognition from electronic medical records of Singaporean hospital (1/2)
title_full_unstemmed Term recognition from electronic medical records of Singaporean hospital (1/2)
title_sort term recognition from electronic medical records of singaporean hospital (1/2)
publishDate 2014
url http://hdl.handle.net/10356/59051
_version_ 1759856214058991616