Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for ADME modeling tasks is...
Saved in:
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
Springer
2016
|
Subjects: | |
Online Access: | http://repository.vnu.edu.vn/handle/VNU_123/11505 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Vietnam National University, Hanoi |
Language: | English |
id |
oai:112.137.131.14:VNU_123-11505 |
---|---|
record_format |
dspace |
spelling |
oai:112.137.131.14:VNU_123-115052017-04-05T14:27:40Z Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling Le, Thi Thu Huong ADME modeling Caco-2 cell permeability Biopharmaceutics classification system Support vector machine Cost-sensitive learning Resampling technique In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, var-ious strategies including cost-sensitive learning and resam-plingmethodswere studied to tackle themoderate imbalance problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for data modeling. Support vector machine classifiers were con-structed and compared using multiple comparison tests. Results showed that the models developed on the basis of resampling strategies displayed better performance than the cost-sensitive classification models, especially in the case of oversampling data wheremisclassification rates for minority class have values of 0.11 and 0.14 for training and test set, respectively. Aconsensusmodel with enhanced applicability domain was subsequently constructed and showed improved performance. This model was used to predict a set of ran-domly selected high-permeability reference drugs according to the biopharmaceutics classification system. Overall, this study provides a comparison of numerous rebalancing strate-gies and displays the effectiveness of oversampling methods to deal with imbalanced permeability data problems 2016-05-30T17:45:51Z 2016-05-30T17:45:51Z 2015 Article 1381-1991 http://repository.vnu.edu.vn/handle/VNU_123/11505 en application/pdf Springer |
institution |
Vietnam National University, Hanoi |
building |
VNU Library & Information Center |
country |
Vietnam |
collection |
VNU Digital Repository |
language |
English |
topic |
ADME modeling Caco-2 cell permeability Biopharmaceutics classification system Support vector machine Cost-sensitive learning Resampling technique |
spellingShingle |
ADME modeling Caco-2 cell permeability Biopharmaceutics classification system Support vector machine Cost-sensitive learning Resampling technique Le, Thi Thu Huong Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling |
description |
In many absorption, distribution, metabolism,
and excretion (ADME) modeling problems, imbalanced
data could negatively affect classification performance of
machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for
ADME modeling tasks is underexplored. In this paper, var-ious strategies including cost-sensitive learning and resam-plingmethodswere studied to tackle themoderate imbalance
problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for
data modeling. Support vector machine classifiers were con-structed and compared using multiple comparison tests.
Results showed that the models developed on the basis of
resampling strategies displayed better performance than the
cost-sensitive classification models, especially in the case of
oversampling data wheremisclassification rates for minority
class have values of 0.11 and 0.14 for training and test set,
respectively. Aconsensusmodel with enhanced applicability
domain was subsequently constructed and showed improved
performance. This model was used to predict a set of ran-domly selected high-permeability reference drugs according
to the biopharmaceutics classification system. Overall, this
study provides a comparison of numerous rebalancing strate-gies and displays the effectiveness of oversampling methods
to deal with imbalanced permeability data problems |
format |
Article |
author |
Le, Thi Thu Huong |
author_facet |
Le, Thi Thu Huong |
author_sort |
Le, Thi Thu Huong |
title |
Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling |
title_short |
Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling |
title_full |
Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling |
title_fullStr |
Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling |
title_full_unstemmed |
Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling |
title_sort |
exploring different strategies for imbalanced adme data problem: case study on caco-2 permeability modeling |
publisher |
Springer |
publishDate |
2016 |
url |
http://repository.vnu.edu.vn/handle/VNU_123/11505 |
_version_ |
1680964283811233792 |