SUPERVISED LEARNING IN HEPATOCELLULAR CARCINOMA TUMOR GRADES CLASSSIFICATION BASED ON DNA PROMOTER METHYLATION
Hepatocellular carcinoma (HCC) is the most dominant type of primary liver cancer, ranked 6th in prevalence worldwide and 5th in Indonesia, as well as 4th in cancer-related mortality globally and 3rd in Indonesia. To reduce the prevalence and mortality of this cancer, a precision cancer approach is n...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/84730 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Hepatocellular carcinoma (HCC) is the most dominant type of primary liver cancer, ranked 6th in prevalence worldwide and 5th in Indonesia, as well as 4th in cancer-related mortality globally and 3rd in Indonesia. To reduce the prevalence and mortality of this cancer, a precision cancer approach is necessary. Tumor grade in HCC, comprising G1 (well-differentiated), G2 (moderately differentiated), G3 (poorly differentiated), and G4 (undifferentiated), has a good prognostic capability based on histology. However, the high molecular heterogeneity of HCC requires the molecular profiles identification that correlate with tumor grade progression. DNA methylation plays a crucial role in cancer initiation and progression by repressing tumor suppressor genes and inducing proto-oncogene activation. However, no studies have yet correlated DNA methylation profiles with the HCC tumor grading system. Additionally, DNA methylation data is high-dimensional and complex, necessitating appropriate analytical methods. Therefore, this study aims to determine whether HCC tumor grades (G1, G2, G3) and normal tissues can be distinguished based on promoter DNA methylation data using supervised learning approaches, as well as to identify relevant methylated genes and their associations with biological processes related to cell migration, proliferation, and differentiation. DNA methylation data were collected from the TCGA-LIHC and GEO GSE61278 studies, consisting of 433 data (G1: 46 samples, G2: 167 samples, G3: 120 samples, normal: 100 samples). After feature selection of CpG probes and oversampling with SMOTE, 1044 CpG sites covering 928 genes were obtained, with a balanced number of data points for each class, which is 167 data. The supervised learning models tested included Gaussian NB, linear SVM, kernel SVM, random forest, and neural network. Performance evaluation showed that linear SVM and kernel SVM with polynomial kernels had the best performance of overall evaluation metrics at 90% in predicting G1, G2, G3, and normal. However, classification between G2 and G3 required binary classification with a kernel SVM model using an RBF kernel for the best performance with overall evaluation matrics at 85%. Further analysis with SHAP revealed 400 relevant genes, with 88 hypermethylated and 312 hypomethylated genes relative to normal. Among the 10 most relevant hypermethylated and hypomethylated genes for each tumor grade, the majority were valid based on the literature and played roles in the activation of Wnt/?-catenin, PI3K/Akt, ERK, TNF?-NF?B signaling pathways, ECM regulation, cytoskeleton regulation, and cell cycle. The methylation patterns of the relevant genes also supported the processes of migration, proliferation, and dvedifferentiation in line with HCC tumor grade progression, which also involved ERK/MAPK signaling, Ras signaling, Ca2+ signaling, and phosphorylation-dephosphorylation activities.
|
---|