DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA

In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain represen...

Full description

Saved in:
Bibliographic Details
Main Author: Rustam
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/68038
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain representative chemical information from several independent analyses for clustering purposes. Therefore, an appropriate independent analyses dimensionality reduction method is required to obtain datasets representing chemical information from several independent analyses. For the aim of dimensionality reduction of independent analyses, classical multidimensional scaling with Euclid and Mahalanobis distances and a modified Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality reduction of the independent analyses are then clustered using the fuzzy clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index. The results of clustering before and after dimensionality reduction was carried out and gave the same optimal number of clusters, namely four. These results are based on the smallest TSS index value obtained before and after dimensional reduction, namely four clusters. Each cluster consists of clove regions from the same origin for dimensionality reduction using AWT and CMDS with Euclidean distance. These results provide information that each clove origin has different characteristics or chemical information from one another. So it can be concluded that each clove origin has a unique taste and aroma. Meanwhile, dimensionality reduction using CMDS with Mahalanobis distance provides information that clove regions are from different origins but in the same cluster. These results provide information that there are differences in taste between clove origins. From these results, it can be concluded that dimensionality reduction using AWT and CMDS with Euclidean distance is appropriate before dimensionality reduction is carried out because it has almost the same distribution of origins in clusters. Furthermore, after the clove metabolite dataset has been clustered, the next step is the identification process if there is a metabolite dataset that has not been identified as a cluster. For the purpose of cluster identification, artificial neural networks (ANN) and k nearest neighbor (KNN) are proposed. In this study, the testing dataset represents metabolite datasets whose clusters have not been identified. The results obtained show that after dimensionality reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however KNN is more efficient in terms of computational time. Meanwhile, after dimensionality reduction using CMDS with Mahalanobis distance, ANN outperformed KNN in terms of accuracy, sensitivity, and specificity, even though KNN was superior in terms of computational time.