DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain represen...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/68038 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:68038 |
---|---|
spelling |
id-itb.:680382022-09-02T08:49:38ZDATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA Rustam Indonesia Dissertations artificial neural networks, chemometrics, data dimensionality reduction, fuzzy clustering, identification, metabolomics INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68038 In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain representative chemical information from several independent analyses for clustering purposes. Therefore, an appropriate independent analyses dimensionality reduction method is required to obtain datasets representing chemical information from several independent analyses. For the aim of dimensionality reduction of independent analyses, classical multidimensional scaling with Euclid and Mahalanobis distances and a modified Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality reduction of the independent analyses are then clustered using the fuzzy clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index. The results of clustering before and after dimensionality reduction was carried out and gave the same optimal number of clusters, namely four. These results are based on the smallest TSS index value obtained before and after dimensional reduction, namely four clusters. Each cluster consists of clove regions from the same origin for dimensionality reduction using AWT and CMDS with Euclidean distance. These results provide information that each clove origin has different characteristics or chemical information from one another. So it can be concluded that each clove origin has a unique taste and aroma. Meanwhile, dimensionality reduction using CMDS with Mahalanobis distance provides information that clove regions are from different origins but in the same cluster. These results provide information that there are differences in taste between clove origins. From these results, it can be concluded that dimensionality reduction using AWT and CMDS with Euclidean distance is appropriate before dimensionality reduction is carried out because it has almost the same distribution of origins in clusters. Furthermore, after the clove metabolite dataset has been clustered, the next step is the identification process if there is a metabolite dataset that has not been identified as a cluster. For the purpose of cluster identification, artificial neural networks (ANN) and k nearest neighbor (KNN) are proposed. In this study, the testing dataset represents metabolite datasets whose clusters have not been identified. The results obtained show that after dimensionality reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however KNN is more efficient in terms of computational time. Meanwhile, after dimensionality reduction using CMDS with Mahalanobis distance, ANN outperformed KNN in terms of accuracy, sensitivity, and specificity, even though KNN was superior in terms of computational time. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
In metabolomics studies, independent analyses or experimental replication of
metabolite concentration measurements are always carried out to anticipate
measurement errors. On the other hand, the size of the datasets will increase
with these independent analyses. It is necessary to obtain representative chemical
information from several independent analyses for clustering purposes. Therefore,
an appropriate independent analyses dimensionality reduction method is required
to obtain datasets representing chemical information from several independent
analyses. For the aim of dimensionality reduction of independent analyses, classical
multidimensional scaling with Euclid and Mahalanobis distances and a modified
Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality
reduction of the independent analyses are then clustered using the fuzzy
clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index.
The results of clustering before and after dimensionality reduction was carried out
and gave the same optimal number of clusters, namely four. These results are based
on the smallest TSS index value obtained before and after dimensional reduction,
namely four clusters. Each cluster consists of clove regions from the same origin
for dimensionality reduction using AWT and CMDS with Euclidean distance. These
results provide information that each clove origin has different characteristics or
chemical information from one another. So it can be concluded that each clove
origin has a unique taste and aroma. Meanwhile, dimensionality reduction using
CMDS with Mahalanobis distance provides information that clove regions are from
different origins but in the same cluster. These results provide information that
there are differences in taste between clove origins. From these results, it can
be concluded that dimensionality reduction using AWT and CMDS with Euclidean
distance is appropriate before dimensionality reduction is carried out because it
has almost the same distribution of origins in clusters. Furthermore, after the clove
metabolite dataset has been clustered, the next step is the identification process if
there is a metabolite dataset that has not been identified as a cluster. For the purpose
of cluster identification, artificial neural networks (ANN) and k nearest neighbor
(KNN) are proposed. In this study, the testing dataset represents metabolite datasets
whose clusters have not been identified. The results obtained show that after dimensionality
reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however
KNN is more efficient in terms of computational time. Meanwhile, after dimensionality
reduction using CMDS with Mahalanobis distance, ANN outperformed KNN
in terms of accuracy, sensitivity, and specificity, even though KNN was superior in
terms of computational time. |
format |
Dissertations |
author |
Rustam |
spellingShingle |
Rustam DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA |
author_facet |
Rustam |
author_sort |
Rustam |
title |
DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA |
title_short |
DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA |
title_full |
DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA |
title_fullStr |
DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA |
title_full_unstemmed |
DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA |
title_sort |
data dimensionality reduction for clustering and identification: a case study on indonesian clove buds metabolite data |
url |
https://digilib.itb.ac.id/gdl/view/68038 |
_version_ |
1822933522827968512 |