A comprehensive exploration to the machine learning techniques for diabetes identification

Diabetes mellitus, known as diabetes, is a group of metabolic disorders and has affected hundreds of millions of people. The detection of diabetes is of great importance, concerning its severe complications. There have been plenty of research studies about diabetes identification, many of which are...

Full description

Saved in:
Bibliographic Details
Main Authors: Wei, Sidong, Zhao, Xuejiao, Miao, Chunyan
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/10356/89478
http://hdl.handle.net/10220/47703
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-89478
record_format dspace
spelling sg-ntu-dr.10356-894782020-03-07T11:48:46Z A comprehensive exploration to the machine learning techniques for diabetes identification Wei, Sidong Zhao, Xuejiao Miao, Chunyan School of Computer Science and Engineering 2018 IEEE 4th World Forum on Internet of Things (WF-IoT) NTU-UBC Research Centre of Excellence in Active Living for the Elderly Deep Neural Network DRNTU::Engineering::Computer science and engineering Machine Learning Diabetes mellitus, known as diabetes, is a group of metabolic disorders and has affected hundreds of millions of people. The detection of diabetes is of great importance, concerning its severe complications. There have been plenty of research studies about diabetes identification, many of which are based on the Pima Indian diabetes data set. It’s a data set studying women in Pima Indian population started from 1965, where the onset rate for diabetes is comparatively high. Most of the research studies done before mainly focused on one or two particular complex technique to test the data, while a comprehensive research over many common techniques is missing. In this paper, we make a comprehensive exploration to the most popular techniques (e.g. DNN (Deep Neural Network), SVM (Support Vector Machine), etc.) used to identify diabetes and data preprocessing methods. Basically, we examine these techniques by the accuracy of cross-validation on the Pima Indian data set. We compare the accuracy of each classifier over several ways of data preprocessors and we modify the parameters to improve their accuracy. The best technique we find has 77.86% accuracy using 10-fold cross-validation. We also analyze the relevance between each feature with the classification result. Accepted version 2019-02-19T06:34:52Z 2019-12-06T17:26:36Z 2019-02-19T06:34:52Z 2019-12-06T17:26:36Z 2018 Conference Paper Wei, S., Zhao, X., & Miao, C. (2018). A comprehensive exploration to the machine learning techniques for diabetes identification. 2018 IEEE 4th World Forum on Internet of Things (WF-IoT). doi:10.1109/WF-IoT.2018.8355130 https://hdl.handle.net/10356/89478 http://hdl.handle.net/10220/47703 10.1109/WF-IoT.2018.8355130 208286 en © 2018 Institute of Electrical and Electronics Engineers (IEEE). All rights reserved. This paper was published in 2018 IEEE 4th World Forum on Internet of Things (WF-IoT) and is made available with permission of Institute of Electrical and Electronics Engineers (IEEE). 5 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Deep Neural Network
DRNTU::Engineering::Computer science and engineering
Machine Learning
spellingShingle Deep Neural Network
DRNTU::Engineering::Computer science and engineering
Machine Learning
Wei, Sidong
Zhao, Xuejiao
Miao, Chunyan
A comprehensive exploration to the machine learning techniques for diabetes identification
description Diabetes mellitus, known as diabetes, is a group of metabolic disorders and has affected hundreds of millions of people. The detection of diabetes is of great importance, concerning its severe complications. There have been plenty of research studies about diabetes identification, many of which are based on the Pima Indian diabetes data set. It’s a data set studying women in Pima Indian population started from 1965, where the onset rate for diabetes is comparatively high. Most of the research studies done before mainly focused on one or two particular complex technique to test the data, while a comprehensive research over many common techniques is missing. In this paper, we make a comprehensive exploration to the most popular techniques (e.g. DNN (Deep Neural Network), SVM (Support Vector Machine), etc.) used to identify diabetes and data preprocessing methods. Basically, we examine these techniques by the accuracy of cross-validation on the Pima Indian data set. We compare the accuracy of each classifier over several ways of data preprocessors and we modify the parameters to improve their accuracy. The best technique we find has 77.86% accuracy using 10-fold cross-validation. We also analyze the relevance between each feature with the classification result.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Wei, Sidong
Zhao, Xuejiao
Miao, Chunyan
format Conference or Workshop Item
author Wei, Sidong
Zhao, Xuejiao
Miao, Chunyan
author_sort Wei, Sidong
title A comprehensive exploration to the machine learning techniques for diabetes identification
title_short A comprehensive exploration to the machine learning techniques for diabetes identification
title_full A comprehensive exploration to the machine learning techniques for diabetes identification
title_fullStr A comprehensive exploration to the machine learning techniques for diabetes identification
title_full_unstemmed A comprehensive exploration to the machine learning techniques for diabetes identification
title_sort comprehensive exploration to the machine learning techniques for diabetes identification
publishDate 2019
url https://hdl.handle.net/10356/89478
http://hdl.handle.net/10220/47703
_version_ 1681037436498477056