Term recognition from electronic medical records of Singaporean hospital (1/2)
Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2014
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/59051 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-59051 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-590512023-03-03T20:39:16Z Term recognition from electronic medical records of Singaporean hospital (1/2) Muhammad Hafiz Mohamed Hassan School of Computer Engineering Kim Jung Jae DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of what is written. The challenge also lies with trying to tie down the medical terms used by the experience doctors in Singapore especially, with the standard terms that are used by other doctors, in different parts of the world. Thus a need was identified to help bridge manually written records by local doctors and ready available medical systems that conforms to other country standard; this project aims to develop a system that helps to be this bridge. The main objective of the project is to develop a term recognition system that can identify the content of what is written by the doctor. The author overall scope was to develop text categorization system to classify the words used in the EMR records by the machine learning approach. The author used Hierarchical TRIE structure to analyse the records, Manual segmentation was done on the records by Sentence disambiguation technique. Feature extraction by the n-grams approach was chosen and feature generation was developed. This was done by thorough analysis of the records and rigorous testing across different designs; each design was done to test different feature/features. A program was created and algorithms generated to classify the words according to the design specification in ARFF, LIBSVM and SVM light format. The result of the classifier was compiled and feature weighting was done. 3 main machine learning tools were used: WEKA, LIBSVM and SVM light. 3 different classifiers were used in WEKA: SMO, Naïve Bayes and J48 decision tree. From the Feature Performance result, it was found that increasing from a tri-gram output to a hex-gram output, introducing other features especially in shape feature form does help to improve its performance. Reducing the shape feature attribute by combining attribute features or omitting a shape feature attribute from the design does not have a big impact on the result. Having shape feature attributes for tri gram previous and next word does help to improve the performance but performance level does not increase when this is done to the hex–gram. Hence conclude that having more attributes for increase in n-value does not improve the performance level. Version 5b was identified as the best feature design. The design was used to re-evaluate test data, and automatic segmentation was done on the original data using the output results of the different classifiers. Comparison was done between the segmentation results of the different classifiers and a segmentation done by the author. Segmentation shows that the classifier with the best performance does not translate to most accurate segmentation result. Naïve B ayes give the most similar result but it has the lowest performance value. Results also show that there’s a lot to be improved in terms of feature design, use of rule base method can be considered in the future to compare with the machine learning approach that is done. More machine learning classifiers could be used and in depth analysis of the segmentation can be done Bachelor of Engineering (Computer Engineering) 2014-04-22T01:59:07Z 2014-04-22T01:59:07Z 2014 2014 Final Year Project (FYP) http://hdl.handle.net/10356/59051 en Nanyang Technological University 87 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Hardware::Performance and reliability Muhammad Hafiz Mohamed Hassan Term recognition from electronic medical records of Singaporean hospital (1/2) |
description |
Doctors at Hospitals daily write reports on patients’ statuses and kept them for future usages. However, with the increase use in computers to store information and analysed data, there have been problems allowing doctors to freely express their diagnosis and get the system to analyse the content of what is written. The challenge also lies with trying to tie down the medical terms used by the experience doctors in Singapore especially, with the standard terms that are used by other doctors, in different parts of the world.
Thus a need was identified to help bridge manually written records by local doctors and ready available medical systems that conforms to other country standard; this project aims to develop a system that helps to be this bridge.
The main objective of the project is to develop a term recognition system that can identify the content of what is written by the doctor.
The author overall scope was to develop text categorization system to classify the words used in the EMR records by the machine learning approach. The author used Hierarchical TRIE structure to analyse the records,
Manual segmentation was done on the records by Sentence disambiguation technique. Feature extraction by the n-grams approach was chosen and feature generation was developed. This was done by thorough analysis of the records and rigorous testing across different designs; each design was done to test different feature/features. A program was created and algorithms generated to classify the words according to the design specification in ARFF, LIBSVM and SVM light format.
The result of the classifier was compiled and feature weighting was done. 3 main machine learning tools were used: WEKA, LIBSVM and SVM light. 3 different classifiers were used in WEKA: SMO, Naïve Bayes and J48 decision tree.
From the Feature Performance result, it was found that increasing from a tri-gram output to a hex-gram output, introducing other features especially in shape feature form does help to improve its performance. Reducing the shape feature attribute by combining attribute features or omitting a shape feature attribute from the design does not have a big impact on the result. Having shape feature attributes for tri gram previous and next word does help to improve the performance but performance level does not increase when this is done to the hex–gram. Hence conclude that having more attributes for increase in n-value does not improve the performance level.
Version 5b was identified as the best feature design. The design was used to re-evaluate test data, and automatic segmentation was done on the original data using the output results of the different classifiers. Comparison was done between the segmentation results of the different classifiers and a segmentation done by the author.
Segmentation shows that the classifier with the best performance does not translate to most accurate segmentation result. Naïve B ayes give the most similar result but it has the lowest performance value.
Results also show that there’s a lot to be improved in terms of feature design, use of rule base method can be considered in the future to compare with the machine learning approach that is done. More machine learning classifiers could be used and in depth analysis of the segmentation can be done |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Muhammad Hafiz Mohamed Hassan |
format |
Final Year Project |
author |
Muhammad Hafiz Mohamed Hassan |
author_sort |
Muhammad Hafiz Mohamed Hassan |
title |
Term recognition from electronic medical records of Singaporean hospital (1/2) |
title_short |
Term recognition from electronic medical records of Singaporean hospital (1/2) |
title_full |
Term recognition from electronic medical records of Singaporean hospital (1/2) |
title_fullStr |
Term recognition from electronic medical records of Singaporean hospital (1/2) |
title_full_unstemmed |
Term recognition from electronic medical records of Singaporean hospital (1/2) |
title_sort |
term recognition from electronic medical records of singaporean hospital (1/2) |
publishDate |
2014 |
url |
http://hdl.handle.net/10356/59051 |
_version_ |
1759856214058991616 |