A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of whi...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Published: |
Elsevier Ltd
2024
|
Subjects: | |
Online Access: | http://eprints.um.edu.my/44823/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaya |
id |
my.um.eprints.44823 |
---|---|
record_format |
eprints |
spelling |
my.um.eprints.448232024-07-02T05:05:51Z http://eprints.um.edu.my/44823/ A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data Xu, Mingzhe Abdullah, Nor Aniza Md Sabri, Aznul Qalid QA75 Electronic computers. Computer science This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods. © 2023 Elsevier Ltd Elsevier Ltd 2024 Article PeerReviewed Xu, Mingzhe and Abdullah, Nor Aniza and Md Sabri, Aznul Qalid (2024) A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data. Computational Biology and Chemistry, 108. ISSN 1476-9271, DOI https://doi.org/10.1016/j.compbiolchem.2023.107997 <https://doi.org/10.1016/j.compbiolchem.2023.107997>. 10.1016/j.compbiolchem.2023.107997 |
institution |
Universiti Malaya |
building |
UM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Malaya |
content_source |
UM Research Repository |
url_provider |
http://eprints.um.edu.my/ |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Xu, Mingzhe Abdullah, Nor Aniza Md Sabri, Aznul Qalid A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data |
description |
This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods. © 2023 Elsevier Ltd |
format |
Article |
author |
Xu, Mingzhe Abdullah, Nor Aniza Md Sabri, Aznul Qalid |
author_facet |
Xu, Mingzhe Abdullah, Nor Aniza Md Sabri, Aznul Qalid |
author_sort |
Xu, Mingzhe |
title |
A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data |
title_short |
A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data |
title_full |
A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data |
title_fullStr |
A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data |
title_full_unstemmed |
A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data |
title_sort |
method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data |
publisher |
Elsevier Ltd |
publishDate |
2024 |
url |
http://eprints.um.edu.my/44823/ |
_version_ |
1805881172035633152 |