BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA

Cancer has been identified as a major challenge in the health sector worldwide. The types of cancer that are the leading causes of death in Indonesia are breast, cervical, lung, and colorectal cancers. The importance of early detection and intervention to reduce morbidity and mortality rates has...

Full description

Saved in:
Bibliographic Details
Main Author: Dewanto Aji Wibisono, Achmad
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76202
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76202
spelling id-itb.:762022023-08-12T17:37:58ZBREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA Dewanto Aji Wibisono, Achmad Indonesia Theses Neural Network, Bioinformatics, Autoencoder, Oncology INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76202 Cancer has been identified as a major challenge in the health sector worldwide. The types of cancer that are the leading causes of death in Indonesia are breast, cervical, lung, and colorectal cancers. The importance of early detection and intervention to reduce morbidity and mortality rates has been understood, thus the need for screening strategies that can minimize the cost and burden of health care are important to implement. The analysis of genetic expression data using an autoencoder structure has been suggested in this study. The final model of autoencoder is composed of gene expression encoder and proteomics encoder which is combined and supervised learning is conducted. It is proposed that this autoencoder can be used to detect predisposition to cancer and identify individuals with high risk. The autoencoder was chosen because the method is deemed capable of handling high-dimensional data and high noise. By using selected multiomics data to enrich feature information, it is hoped that the biological process of cancer can be better understood by the model. The RNA-seq and proteome profile data used were obtained from a public database. This autoencoder method was then compared with other methods in processing this multiomics data such as PCA (Principal Component Analysis), Artificial Neural Networks, Logistic Regression, and Support Vector Machine. Based on the results of the study, it appears that the autoencoder-based multiomics model has better performance in learning the features of unbalanced data compared to other models. This can be seen not only by accuracy comparison but also with metrics related to class (F1-Score cancer positive class Autoencoder model 86% compared to Neural Network model 31%). Using SHAP and Random Forest Classifier as a comparison method, each gene expression feature or protein that contributes most to cancer classification can be ranked. Of the 10 most important features, 9/10 for gene expression, and 8/10 from proteomics were found to have a foundation from clinical literature that validates these findings. The significant implications of these findings can contribute to an improvement in cancer prevention and research strategies. To date, there is still urgent need for further genetic studies, especially in the Indonesian population text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Cancer has been identified as a major challenge in the health sector worldwide. The types of cancer that are the leading causes of death in Indonesia are breast, cervical, lung, and colorectal cancers. The importance of early detection and intervention to reduce morbidity and mortality rates has been understood, thus the need for screening strategies that can minimize the cost and burden of health care are important to implement. The analysis of genetic expression data using an autoencoder structure has been suggested in this study. The final model of autoencoder is composed of gene expression encoder and proteomics encoder which is combined and supervised learning is conducted. It is proposed that this autoencoder can be used to detect predisposition to cancer and identify individuals with high risk. The autoencoder was chosen because the method is deemed capable of handling high-dimensional data and high noise. By using selected multiomics data to enrich feature information, it is hoped that the biological process of cancer can be better understood by the model. The RNA-seq and proteome profile data used were obtained from a public database. This autoencoder method was then compared with other methods in processing this multiomics data such as PCA (Principal Component Analysis), Artificial Neural Networks, Logistic Regression, and Support Vector Machine. Based on the results of the study, it appears that the autoencoder-based multiomics model has better performance in learning the features of unbalanced data compared to other models. This can be seen not only by accuracy comparison but also with metrics related to class (F1-Score cancer positive class Autoencoder model 86% compared to Neural Network model 31%). Using SHAP and Random Forest Classifier as a comparison method, each gene expression feature or protein that contributes most to cancer classification can be ranked. Of the 10 most important features, 9/10 for gene expression, and 8/10 from proteomics were found to have a foundation from clinical literature that validates these findings. The significant implications of these findings can contribute to an improvement in cancer prevention and research strategies. To date, there is still urgent need for further genetic studies, especially in the Indonesian population
format Theses
author Dewanto Aji Wibisono, Achmad
spellingShingle Dewanto Aji Wibisono, Achmad
BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
author_facet Dewanto Aji Wibisono, Achmad
author_sort Dewanto Aji Wibisono, Achmad
title BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
title_short BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
title_full BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
title_fullStr BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
title_full_unstemmed BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
title_sort breast cancer prediction based on autoencoder utilizing multi-omics data
url https://digilib.itb.ac.id/gdl/view/76202
_version_ 1822007910124748800