A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data

This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of whi...

Full description

Saved in:
Bibliographic Details
Main Authors: Xu, Mingzhe, Abdullah, Nor Aniza, Md Sabri, Aznul Qalid
Format: Article
Published: Elsevier Ltd 2024
Subjects:
Online Access:http://eprints.um.edu.my/44823/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaya
id my.um.eprints.44823
record_format eprints
spelling my.um.eprints.448232024-07-02T05:05:51Z http://eprints.um.edu.my/44823/ A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data Xu, Mingzhe Abdullah, Nor Aniza Md Sabri, Aznul Qalid QA75 Electronic computers. Computer science This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods. © 2023 Elsevier Ltd Elsevier Ltd 2024 Article PeerReviewed Xu, Mingzhe and Abdullah, Nor Aniza and Md Sabri, Aznul Qalid (2024) A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data. Computational Biology and Chemistry, 108. ISSN 1476-9271, DOI https://doi.org/10.1016/j.compbiolchem.2023.107997 <https://doi.org/10.1016/j.compbiolchem.2023.107997>. 10.1016/j.compbiolchem.2023.107997
institution Universiti Malaya
building UM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaya
content_source UM Research Repository
url_provider http://eprints.um.edu.my/
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Xu, Mingzhe
Abdullah, Nor Aniza
Md Sabri, Aznul Qalid
A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
description This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods. © 2023 Elsevier Ltd
format Article
author Xu, Mingzhe
Abdullah, Nor Aniza
Md Sabri, Aznul Qalid
author_facet Xu, Mingzhe
Abdullah, Nor Aniza
Md Sabri, Aznul Qalid
author_sort Xu, Mingzhe
title A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
title_short A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
title_full A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
title_fullStr A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
title_full_unstemmed A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
title_sort method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data
publisher Elsevier Ltd
publishDate 2024
url http://eprints.um.edu.my/44823/
_version_ 1805881172035633152