BREAST CANCER PREDICTION BASED ON AUTOENCODER UTILIZING MULTI-OMICS DATA
Cancer has been identified as a major challenge in the health sector worldwide. The types of cancer that are the leading causes of death in Indonesia are breast, cervical, lung, and colorectal cancers. The importance of early detection and intervention to reduce morbidity and mortality rates has...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/76202 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Cancer has been identified as a major challenge in the health sector worldwide.
The types of cancer that are the leading causes of death in Indonesia are breast,
cervical, lung, and colorectal cancers. The importance of early detection and
intervention to reduce morbidity and mortality rates has been understood, thus the
need for screening strategies that can minimize the cost and burden of health care
are important to implement.
The analysis of genetic expression data using an autoencoder structure has been
suggested in this study. The final model of autoencoder is composed of gene
expression encoder and proteomics encoder which is combined and supervised
learning is conducted. It is proposed that this autoencoder can be used to detect
predisposition to cancer and identify individuals with high risk. The autoencoder
was chosen because the method is deemed capable of handling high-dimensional
data and high noise. By using selected multiomics data to enrich feature
information, it is hoped that the biological process of cancer can be better
understood by the model. The RNA-seq and proteome profile data used were
obtained from a public database. This autoencoder method was then compared with
other methods in processing this multiomics data such as PCA (Principal
Component Analysis), Artificial Neural Networks, Logistic Regression, and
Support Vector Machine.
Based on the results of the study, it appears that the autoencoder-based multiomics
model has better performance in learning the features of unbalanced data
compared to other models. This can be seen not only by accuracy comparison but
also with metrics related to class (F1-Score cancer positive class Autoencoder
model 86% compared to Neural Network model 31%). Using SHAP and Random
Forest Classifier as a comparison method, each gene expression feature or protein
that contributes most to cancer classification can be ranked. Of the 10 most
important features, 9/10 for gene expression, and 8/10 from proteomics were found
to have a foundation from clinical literature that validates these findings. The
significant implications of these findings can contribute to an improvement in
cancer prevention and research strategies. To date, there is still urgent need for
further genetic studies, especially in the Indonesian population |
---|