DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA

In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain represen...

Full description

Saved in:
Bibliographic Details
Main Author: Rustam
Format: Dissertations
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/68038
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:68038
spelling id-itb.:680382022-09-02T08:49:38ZDATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA Rustam Indonesia Dissertations artificial neural networks, chemometrics, data dimensionality reduction, fuzzy clustering, identification, metabolomics INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68038 In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain representative chemical information from several independent analyses for clustering purposes. Therefore, an appropriate independent analyses dimensionality reduction method is required to obtain datasets representing chemical information from several independent analyses. For the aim of dimensionality reduction of independent analyses, classical multidimensional scaling with Euclid and Mahalanobis distances and a modified Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality reduction of the independent analyses are then clustered using the fuzzy clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index. The results of clustering before and after dimensionality reduction was carried out and gave the same optimal number of clusters, namely four. These results are based on the smallest TSS index value obtained before and after dimensional reduction, namely four clusters. Each cluster consists of clove regions from the same origin for dimensionality reduction using AWT and CMDS with Euclidean distance. These results provide information that each clove origin has different characteristics or chemical information from one another. So it can be concluded that each clove origin has a unique taste and aroma. Meanwhile, dimensionality reduction using CMDS with Mahalanobis distance provides information that clove regions are from different origins but in the same cluster. These results provide information that there are differences in taste between clove origins. From these results, it can be concluded that dimensionality reduction using AWT and CMDS with Euclidean distance is appropriate before dimensionality reduction is carried out because it has almost the same distribution of origins in clusters. Furthermore, after the clove metabolite dataset has been clustered, the next step is the identification process if there is a metabolite dataset that has not been identified as a cluster. For the purpose of cluster identification, artificial neural networks (ANN) and k nearest neighbor (KNN) are proposed. In this study, the testing dataset represents metabolite datasets whose clusters have not been identified. The results obtained show that after dimensionality reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however KNN is more efficient in terms of computational time. Meanwhile, after dimensionality reduction using CMDS with Mahalanobis distance, ANN outperformed KNN in terms of accuracy, sensitivity, and specificity, even though KNN was superior in terms of computational time. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain representative chemical information from several independent analyses for clustering purposes. Therefore, an appropriate independent analyses dimensionality reduction method is required to obtain datasets representing chemical information from several independent analyses. For the aim of dimensionality reduction of independent analyses, classical multidimensional scaling with Euclid and Mahalanobis distances and a modified Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality reduction of the independent analyses are then clustered using the fuzzy clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index. The results of clustering before and after dimensionality reduction was carried out and gave the same optimal number of clusters, namely four. These results are based on the smallest TSS index value obtained before and after dimensional reduction, namely four clusters. Each cluster consists of clove regions from the same origin for dimensionality reduction using AWT and CMDS with Euclidean distance. These results provide information that each clove origin has different characteristics or chemical information from one another. So it can be concluded that each clove origin has a unique taste and aroma. Meanwhile, dimensionality reduction using CMDS with Mahalanobis distance provides information that clove regions are from different origins but in the same cluster. These results provide information that there are differences in taste between clove origins. From these results, it can be concluded that dimensionality reduction using AWT and CMDS with Euclidean distance is appropriate before dimensionality reduction is carried out because it has almost the same distribution of origins in clusters. Furthermore, after the clove metabolite dataset has been clustered, the next step is the identification process if there is a metabolite dataset that has not been identified as a cluster. For the purpose of cluster identification, artificial neural networks (ANN) and k nearest neighbor (KNN) are proposed. In this study, the testing dataset represents metabolite datasets whose clusters have not been identified. The results obtained show that after dimensionality reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however KNN is more efficient in terms of computational time. Meanwhile, after dimensionality reduction using CMDS with Mahalanobis distance, ANN outperformed KNN in terms of accuracy, sensitivity, and specificity, even though KNN was superior in terms of computational time.
format Dissertations
author Rustam
spellingShingle Rustam
DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
author_facet Rustam
author_sort Rustam
title DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_short DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_full DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_fullStr DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_full_unstemmed DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_sort data dimensionality reduction for clustering and identification: a case study on indonesian clove buds metabolite data
url https://digilib.itb.ac.id/gdl/view/68038
_version_ 1822933522827968512