Challenges and solutions in drug-target interaction prediction

When a drug is developed, it is designed so that it interacts with a specific target of interest in order to achieve the desired therapeutic effect. However, it is quite common to later find that the developed drug also interacts with multiple other targets that were not intended during its developm...

Full description

Saved in:
Bibliographic Details
Main Author: Ezzat, Ali
Other Authors: Kwoh Chee Keong
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75771
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-75771
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
spellingShingle DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences
Ezzat, Ali
Challenges and solutions in drug-target interaction prediction
description When a drug is developed, it is designed so that it interacts with a specific target of interest in order to achieve the desired therapeutic effect. However, it is quite common to later find that the developed drug also interacts with multiple other targets that were not intended during its development. This is interesting because if a drug can interact with multiple targets, then it may have more than one therapeutic effect. Therefore, this provides a clear motivation for discovering new interactions for existing drugs. In drug discovery, an important task called drug-target interaction prediction detects such interactions on a large scale by screening many drugs and targets simultaneously. While there are wet-lab techniques for discovering these interactions, the focus of this thesis is particularly on computational drug-target interaction prediction. Specifically, we investigate methods that discover new interactions based on prior knowledge of existing drugs and their experimentally confirmed targets (i.e. machine learning). Throughout this thesis, we identified and addressed 4 outstanding problems in drug target interaction (DTI) prediction. Having addressed these problems, we were able to enhance the prediction performance and outperform relevant state-of-the-art methods. Firstly, DTI prediction methods have difficulty predicting interactions involving new drugs or targets for which there are no known interactions. To predict interactions, we developed two matrix factorization methods that utilize graph regularization. In addition, considering that many of the non-occurring edges in the bipartite DTI network are actually unknown or missing cases, we developed a preprocessing step to enhance predictions in the “new drug” and “new target” cases by adding edges with intermediate interaction likelihood scores. In our experiments, our methods performed better than the state-of-the-art methods and was found to predict interactions reasonably well. Secondly, class imbalance is an issue that is prevalent across all DTI datasets. Class imbalance can be divided into two sub-problems, namely between-class and within-class 7 imbalance. Between-class imbalance refers to the imbalance ratio between interacting and non-interacting drug-target pairs; this degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Withinclass imbalance refers to the imbalance between the sizes of sub-groups (types) of interactions; this biases the predictions towards the bigger and more well-represented sub-groups, leading to more errors in the smaller groups. Here, we developed an ensemble learning method that incorporates techniques to address the issues of between class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. Thirdly, there are DTI datasets where the feature sets for representing the drugs and targets (and, by extension, the drug-target pairs) are of a high dimensionality. High dimensionality of the data may lead to much longer running times for the prediction models. Furthermore, there may be redundancy in the features which may also lead to degradation in prediction performance. In this work, we used dimensionality reduction to deal with both of these issues, and we additionally used ensemble learning to improve the prediction performance further. As base learners for the ensemble, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. Experimental results show that our proposed methods are indeed successful. Lastly, there is a concept called differential representation bias that has an impact on the prediction performance of DTI prediction methods. Specifically, differential representation bias refers to how much a drug (or target) appears in the positive training data as opposed to the negative data. Bearing this concept in mind, we experimented with the way that the negative training data is sampled prior to training the prediction model. We found that our modified sampling procedure produced significant improvements in DTI prediction performance.
author2 Kwoh Chee Keong
author_facet Kwoh Chee Keong
Ezzat, Ali
format Theses and Dissertations
author Ezzat, Ali
author_sort Ezzat, Ali
title Challenges and solutions in drug-target interaction prediction
title_short Challenges and solutions in drug-target interaction prediction
title_full Challenges and solutions in drug-target interaction prediction
title_fullStr Challenges and solutions in drug-target interaction prediction
title_full_unstemmed Challenges and solutions in drug-target interaction prediction
title_sort challenges and solutions in drug-target interaction prediction
publishDate 2018
url http://hdl.handle.net/10356/75771
_version_ 1759855424176128000
spelling sg-ntu-dr.10356-757712023-03-04T00:52:10Z Challenges and solutions in drug-target interaction prediction Ezzat, Ali Kwoh Chee Keong School of Computer Science and Engineering Bioinformatics Research Centre Li Xiaoli DRNTU::Engineering::Computer science and engineering::Computer applications::Life and medical sciences When a drug is developed, it is designed so that it interacts with a specific target of interest in order to achieve the desired therapeutic effect. However, it is quite common to later find that the developed drug also interacts with multiple other targets that were not intended during its development. This is interesting because if a drug can interact with multiple targets, then it may have more than one therapeutic effect. Therefore, this provides a clear motivation for discovering new interactions for existing drugs. In drug discovery, an important task called drug-target interaction prediction detects such interactions on a large scale by screening many drugs and targets simultaneously. While there are wet-lab techniques for discovering these interactions, the focus of this thesis is particularly on computational drug-target interaction prediction. Specifically, we investigate methods that discover new interactions based on prior knowledge of existing drugs and their experimentally confirmed targets (i.e. machine learning). Throughout this thesis, we identified and addressed 4 outstanding problems in drug target interaction (DTI) prediction. Having addressed these problems, we were able to enhance the prediction performance and outperform relevant state-of-the-art methods. Firstly, DTI prediction methods have difficulty predicting interactions involving new drugs or targets for which there are no known interactions. To predict interactions, we developed two matrix factorization methods that utilize graph regularization. In addition, considering that many of the non-occurring edges in the bipartite DTI network are actually unknown or missing cases, we developed a preprocessing step to enhance predictions in the “new drug” and “new target” cases by adding edges with intermediate interaction likelihood scores. In our experiments, our methods performed better than the state-of-the-art methods and was found to predict interactions reasonably well. Secondly, class imbalance is an issue that is prevalent across all DTI datasets. Class imbalance can be divided into two sub-problems, namely between-class and within-class 7 imbalance. Between-class imbalance refers to the imbalance ratio between interacting and non-interacting drug-target pairs; this degrades prediction performance due to the bias in prediction results towards the majority class (i.e. the non-interacting pairs), leading to more prediction errors in the minority class (i.e. the interacting pairs). Withinclass imbalance refers to the imbalance between the sizes of sub-groups (types) of interactions; this biases the predictions towards the bigger and more well-represented sub-groups, leading to more errors in the smaller groups. Here, we developed an ensemble learning method that incorporates techniques to address the issues of between class imbalance and within-class imbalance. Experiments show that the proposed method improves results over 4 state-of-the-art methods. Thirdly, there are DTI datasets where the feature sets for representing the drugs and targets (and, by extension, the drug-target pairs) are of a high dimensionality. High dimensionality of the data may lead to much longer running times for the prediction models. Furthermore, there may be redundancy in the features which may also lead to degradation in prediction performance. In this work, we used dimensionality reduction to deal with both of these issues, and we additionally used ensemble learning to improve the prediction performance further. As base learners for the ensemble, we selected two classifiers, namely Decision Tree and Kernel Ridge Regression, resulting in two variants of ensemble models, EnsemDT and EnsemKRR, respectively. Experimental results show that our proposed methods are indeed successful. Lastly, there is a concept called differential representation bias that has an impact on the prediction performance of DTI prediction methods. Specifically, differential representation bias refers to how much a drug (or target) appears in the positive training data as opposed to the negative data. Bearing this concept in mind, we experimented with the way that the negative training data is sampled prior to training the prediction model. We found that our modified sampling procedure produced significant improvements in DTI prediction performance. Doctor of Philosophy (SCE) 2018-06-14T04:02:59Z 2018-06-14T04:02:59Z 2018 Thesis Ezzat, A. (2018). Challenges and solutions in drug-target interaction prediction. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/75771 10.32657/10356/75771 en 165 p. application/pdf