BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
Cancer has been identified as a major challenge in the health sector worldwide. The types of cancer that are the leading causes of death in Indonesia are breast, cervical, lung, and colorectal cancers. The importance of early detection and intervention to reduce morbidity and mortality rates has...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76202 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:76202 |
---|---|
spelling |
id-itb.:762022023-08-12T17:37:58ZBREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA Dewanto Aji Wibisono, Achmad Indonesia Theses Neural Network, Bioinformatics, Autoencoder, Oncology INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76202 Cancer has been identified as a major challenge in the health sector worldwide. The types of cancer that are the leading causes of death in Indonesia are breast, cervical, lung, and colorectal cancers. The importance of early detection and intervention to reduce morbidity and mortality rates has been understood, thus the need for screening strategies that can minimize the cost and burden of health care are important to implement. The analysis of genetic expression data using an autoencoder structure has been suggested in this study. The final model of autoencoder is composed of gene expression encoder and proteomics encoder which is combined and supervised learning is conducted. It is proposed that this autoencoder can be used to detect predisposition to cancer and identify individuals with high risk. The autoencoder was chosen because the method is deemed capable of handling high-dimensional data and high noise. By using selected multiomics data to enrich feature information, it is hoped that the biological process of cancer can be better understood by the model. The RNA-seq and proteome profile data used were obtained from a public database. This autoencoder method was then compared with other methods in processing this multiomics data such as PCA (Principal Component Analysis), Artificial Neural Networks, Logistic Regression, and Support Vector Machine. Based on the results of the study, it appears that the autoencoder-based multiomics model has better performance in learning the features of unbalanced data compared to other models. This can be seen not only by accuracy comparison but also with metrics related to class (F1-Score cancer positive class Autoencoder model 86% compared to Neural Network model 31%). Using SHAP and Random Forest Classifier as a comparison method, each gene expression feature or protein that contributes most to cancer classification can be ranked. Of the 10 most important features, 9/10 for gene expression, and 8/10 from proteomics were found to have a foundation from clinical literature that validates these findings. The significant implications of these findings can contribute to an improvement in cancer prevention and research strategies. To date, there is still urgent need for further genetic studies, especially in the Indonesian population text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Cancer has been identified as a major challenge in the health sector worldwide.
The types of cancer that are the leading causes of death in Indonesia are breast,
cervical, lung, and colorectal cancers. The importance of early detection and
intervention to reduce morbidity and mortality rates has been understood, thus the
need for screening strategies that can minimize the cost and burden of health care
are important to implement.
The analysis of genetic expression data using an autoencoder structure has been
suggested in this study. The final model of autoencoder is composed of gene
expression encoder and proteomics encoder which is combined and supervised
learning is conducted. It is proposed that this autoencoder can be used to detect
predisposition to cancer and identify individuals with high risk. The autoencoder
was chosen because the method is deemed capable of handling high-dimensional
data and high noise. By using selected multiomics data to enrich feature
information, it is hoped that the biological process of cancer can be better
understood by the model. The RNA-seq and proteome profile data used were
obtained from a public database. This autoencoder method was then compared with
other methods in processing this multiomics data such as PCA (Principal
Component Analysis), Artificial Neural Networks, Logistic Regression, and
Support Vector Machine.
Based on the results of the study, it appears that the autoencoder-based multiomics
model has better performance in learning the features of unbalanced data
compared to other models. This can be seen not only by accuracy comparison but
also with metrics related to class (F1-Score cancer positive class Autoencoder
model 86% compared to Neural Network model 31%). Using SHAP and Random
Forest Classifier as a comparison method, each gene expression feature or protein
that contributes most to cancer classification can be ranked. Of the 10 most
important features, 9/10 for gene expression, and 8/10 from proteomics were found
to have a foundation from clinical literature that validates these findings. The
significant implications of these findings can contribute to an improvement in
cancer prevention and research strategies. To date, there is still urgent need for
further genetic studies, especially in the Indonesian population |
format |
Theses |
author |
Dewanto Aji Wibisono, Achmad |
spellingShingle |
Dewanto Aji Wibisono, Achmad BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA |
author_facet |
Dewanto Aji Wibisono, Achmad |
author_sort |
Dewanto Aji Wibisono, Achmad |
title |
BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA |
title_short |
BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA |
title_full |
BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA |
title_fullStr |
BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA |
title_full_unstemmed |
BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA |
title_sort |
breast cancer prediction based on autoencoder utilizing multi-omics data |
url |
https://digilib.itb.ac.id/gdl/view/76202 |
_version_ |
1822007910124748800 |