A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making

Classification is an important medical decision support function that can be seriously affected by disproportionate class distribution in the training data. In medical decision making, the rate of misclassification and the cost of misclassifying a minority (positive) class as a majority (negative) c...

Full description

Saved in:
Bibliographic Details
Main Authors: Yin H., Tze-Yun LEONG
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2010
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2988
http://dx.doi.org/10.3233/978-1-60750-588-4-856
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3988
record_format dspace
spelling sg-smu-ink.sis_research-39882016-02-25T07:29:03Z A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making Yin H., Tze-Yun LEONG, Classification is an important medical decision support function that can be seriously affected by disproportionate class distribution in the training data. In medical decision making, the rate of misclassification and the cost of misclassifying a minority (positive) class as a majority (negative) class are especially high. In this paper, we propose a new model-driven sampling approach to balancing data samples. Most existing data sampling methods produce new data points based on local, deterministic information. Our approach extends the idea of generative sampling to produce new data points based on an induced probabilistic graphical model. We present the motivation and the design of the proposed algorithm, and compare it with two representative imbalanced data sampling approaches on four medical data sets varying in size, imbalance ratio, and dimension. The empirical study helped identify the challenges in imbalanced data problems in medicine, and highlighted the strengths and limitations of the relevant sampling approaches. Performance of the model driven approach is shown to be comparable with existing approaches; potential improvements could be achieved by incorporating domain knowledge. © 2010 IMIA and SAHIA. All rights reserved. 2010-12-01T08:00:00Z text https://ink.library.smu.edu.sg/sis_research/2988 info:doi/10.3233/978-1-60750-588-4-856 http://dx.doi.org/10.3233/978-1-60750-588-4-856 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Imbalanced data learning Model driven sampling Random sampling Synthetic Minority Over Sampling (SMOTE) Databases and Information Systems Health Information Technology
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Imbalanced data learning
Model driven sampling
Random sampling
Synthetic Minority Over Sampling (SMOTE)
Databases and Information Systems
Health Information Technology
spellingShingle Imbalanced data learning
Model driven sampling
Random sampling
Synthetic Minority Over Sampling (SMOTE)
Databases and Information Systems
Health Information Technology
Yin H.,
Tze-Yun LEONG,
A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making
description Classification is an important medical decision support function that can be seriously affected by disproportionate class distribution in the training data. In medical decision making, the rate of misclassification and the cost of misclassifying a minority (positive) class as a majority (negative) class are especially high. In this paper, we propose a new model-driven sampling approach to balancing data samples. Most existing data sampling methods produce new data points based on local, deterministic information. Our approach extends the idea of generative sampling to produce new data points based on an induced probabilistic graphical model. We present the motivation and the design of the proposed algorithm, and compare it with two representative imbalanced data sampling approaches on four medical data sets varying in size, imbalance ratio, and dimension. The empirical study helped identify the challenges in imbalanced data problems in medicine, and highlighted the strengths and limitations of the relevant sampling approaches. Performance of the model driven approach is shown to be comparable with existing approaches; potential improvements could be achieved by incorporating domain knowledge. © 2010 IMIA and SAHIA. All rights reserved.
format text
author Yin H.,
Tze-Yun LEONG,
author_facet Yin H.,
Tze-Yun LEONG,
author_sort Yin H.,
title A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making
title_short A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making
title_full A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making
title_fullStr A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making
title_full_unstemmed A Model Driven Approach to Imbalanced Data Sampling in Medical Decision Making
title_sort model driven approach to imbalanced data sampling in medical decision making
publisher Institutional Knowledge at Singapore Management University
publishDate 2010
url https://ink.library.smu.edu.sg/sis_research/2988
http://dx.doi.org/10.3233/978-1-60750-588-4-856
_version_ 1770572770757836800