DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain represen...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/68038 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | In metabolomics studies, independent analyses or experimental replication of
metabolite concentration measurements are always carried out to anticipate
measurement errors. On the other hand, the size of the datasets will increase
with these independent analyses. It is necessary to obtain representative chemical
information from several independent analyses for clustering purposes. Therefore,
an appropriate independent analyses dimensionality reduction method is required
to obtain datasets representing chemical information from several independent
analyses. For the aim of dimensionality reduction of independent analyses, classical
multidimensional scaling with Euclid and Mahalanobis distances and a modified
Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality
reduction of the independent analyses are then clustered using the fuzzy
clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index.
The results of clustering before and after dimensionality reduction was carried out
and gave the same optimal number of clusters, namely four. These results are based
on the smallest TSS index value obtained before and after dimensional reduction,
namely four clusters. Each cluster consists of clove regions from the same origin
for dimensionality reduction using AWT and CMDS with Euclidean distance. These
results provide information that each clove origin has different characteristics or
chemical information from one another. So it can be concluded that each clove
origin has a unique taste and aroma. Meanwhile, dimensionality reduction using
CMDS with Mahalanobis distance provides information that clove regions are from
different origins but in the same cluster. These results provide information that
there are differences in taste between clove origins. From these results, it can
be concluded that dimensionality reduction using AWT and CMDS with Euclidean
distance is appropriate before dimensionality reduction is carried out because it
has almost the same distribution of origins in clusters. Furthermore, after the clove
metabolite dataset has been clustered, the next step is the identification process if
there is a metabolite dataset that has not been identified as a cluster. For the purpose
of cluster identification, artificial neural networks (ANN) and k nearest neighbor
(KNN) are proposed. In this study, the testing dataset represents metabolite datasets
whose clusters have not been identified. The results obtained show that after dimensionality
reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however
KNN is more efficient in terms of computational time. Meanwhile, after dimensionality
reduction using CMDS with Mahalanobis distance, ANN outperformed KNN
in terms of accuracy, sensitivity, and specificity, even though KNN was superior in
terms of computational time. |
---|