Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling

In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for ADME modeling tasks is...

Full description

Saved in:
Bibliographic Details
Main Author: Le, Thi Thu Huong
Format: Article
Language:English
Published: Springer 2016
Subjects:
Online Access:http://repository.vnu.edu.vn/handle/VNU_123/11505
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Vietnam National University, Hanoi
Language: English
id oai:112.137.131.14:VNU_123-11505
record_format dspace
spelling oai:112.137.131.14:VNU_123-115052017-04-05T14:27:40Z Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling Le, Thi Thu Huong ADME modeling Caco-2 cell permeability Biopharmaceutics classification system Support vector machine Cost-sensitive learning Resampling technique In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, var-ious strategies including cost-sensitive learning and resam-plingmethodswere studied to tackle themoderate imbalance problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for data modeling. Support vector machine classifiers were con-structed and compared using multiple comparison tests. Results showed that the models developed on the basis of resampling strategies displayed better performance than the cost-sensitive classification models, especially in the case of oversampling data wheremisclassification rates for minority class have values of 0.11 and 0.14 for training and test set, respectively. Aconsensusmodel with enhanced applicability domain was subsequently constructed and showed improved performance. This model was used to predict a set of ran-domly selected high-permeability reference drugs according to the biopharmaceutics classification system. Overall, this study provides a comparison of numerous rebalancing strate-gies and displays the effectiveness of oversampling methods to deal with imbalanced permeability data problems 2016-05-30T17:45:51Z 2016-05-30T17:45:51Z 2015 Article 1381-1991 http://repository.vnu.edu.vn/handle/VNU_123/11505 en application/pdf Springer
institution Vietnam National University, Hanoi
building VNU Library & Information Center
country Vietnam
collection VNU Digital Repository
language English
topic ADME modeling
Caco-2 cell permeability
Biopharmaceutics classification system
Support vector machine
Cost-sensitive learning
Resampling technique
spellingShingle ADME modeling
Caco-2 cell permeability
Biopharmaceutics classification system
Support vector machine
Cost-sensitive learning
Resampling technique
Le, Thi Thu Huong
Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
description In many absorption, distribution, metabolism, and excretion (ADME) modeling problems, imbalanced data could negatively affect classification performance of machine learning algorithms. Solutions for handling imbal-anced dataset have been proposed, but their application for ADME modeling tasks is underexplored. In this paper, var-ious strategies including cost-sensitive learning and resam-plingmethodswere studied to tackle themoderate imbalance problem of a large Caco-2 cell permeability database. Simple physicochemical molecular descriptors were utilized for data modeling. Support vector machine classifiers were con-structed and compared using multiple comparison tests. Results showed that the models developed on the basis of resampling strategies displayed better performance than the cost-sensitive classification models, especially in the case of oversampling data wheremisclassification rates for minority class have values of 0.11 and 0.14 for training and test set, respectively. Aconsensusmodel with enhanced applicability domain was subsequently constructed and showed improved performance. This model was used to predict a set of ran-domly selected high-permeability reference drugs according to the biopharmaceutics classification system. Overall, this study provides a comparison of numerous rebalancing strate-gies and displays the effectiveness of oversampling methods to deal with imbalanced permeability data problems
format Article
author Le, Thi Thu Huong
author_facet Le, Thi Thu Huong
author_sort Le, Thi Thu Huong
title Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
title_short Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
title_full Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
title_fullStr Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
title_full_unstemmed Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling
title_sort exploring different strategies for imbalanced adme data problem: case study on caco-2 permeability modeling
publisher Springer
publishDate 2016
url http://repository.vnu.edu.vn/handle/VNU_123/11505
_version_ 1680964283811233792