DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA

In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain represen...

Full description

Saved in:

Bibliographic Details
Main Author:	Rustam
Format:	Dissertations
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/68038
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:68038
spelling	id-itb.:680382022-09-02T08:49:38ZDATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA Rustam Indonesia Dissertations artificial neural networks, chemometrics, data dimensionality reduction, fuzzy clustering, identification, metabolomics INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/68038 In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain representative chemical information from several independent analyses for clustering purposes. Therefore, an appropriate independent analyses dimensionality reduction method is required to obtain datasets representing chemical information from several independent analyses. For the aim of dimensionality reduction of independent analyses, classical multidimensional scaling with Euclid and Mahalanobis distances and a modified Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality reduction of the independent analyses are then clustered using the fuzzy clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index. The results of clustering before and after dimensionality reduction was carried out and gave the same optimal number of clusters, namely four. These results are based on the smallest TSS index value obtained before and after dimensional reduction, namely four clusters. Each cluster consists of clove regions from the same origin for dimensionality reduction using AWT and CMDS with Euclidean distance. These results provide information that each clove origin has different characteristics or chemical information from one another. So it can be concluded that each clove origin has a unique taste and aroma. Meanwhile, dimensionality reduction using CMDS with Mahalanobis distance provides information that clove regions are from different origins but in the same cluster. These results provide information that there are differences in taste between clove origins. From these results, it can be concluded that dimensionality reduction using AWT and CMDS with Euclidean distance is appropriate before dimensionality reduction is carried out because it has almost the same distribution of origins in clusters. Furthermore, after the clove metabolite dataset has been clustered, the next step is the identification process if there is a metabolite dataset that has not been identified as a cluster. For the purpose of cluster identification, artificial neural networks (ANN) and k nearest neighbor (KNN) are proposed. In this study, the testing dataset represents metabolite datasets whose clusters have not been identified. The results obtained show that after dimensionality reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however KNN is more efficient in terms of computational time. Meanwhile, after dimensionality reduction using CMDS with Mahalanobis distance, ANN outperformed KNN in terms of accuracy, sensitivity, and specificity, even though KNN was superior in terms of computational time. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	In metabolomics studies, independent analyses or experimental replication of metabolite concentration measurements are always carried out to anticipate measurement errors. On the other hand, the size of the datasets will increase with these independent analyses. It is necessary to obtain representative chemical information from several independent analyses for clustering purposes. Therefore, an appropriate independent analyses dimensionality reduction method is required to obtain datasets representing chemical information from several independent analyses. For the aim of dimensionality reduction of independent analyses, classical multidimensional scaling with Euclid and Mahalanobis distances and a modified Weiszfeld algorithm are proposed. The metabolite datasets before and after dimensionality reduction of the independent analyses are then clustered using the fuzzy clustering approach with the Tang Sun Sun (TSS) index as the cluster validity index. The results of clustering before and after dimensionality reduction was carried out and gave the same optimal number of clusters, namely four. These results are based on the smallest TSS index value obtained before and after dimensional reduction, namely four clusters. Each cluster consists of clove regions from the same origin for dimensionality reduction using AWT and CMDS with Euclidean distance. These results provide information that each clove origin has different characteristics or chemical information from one another. So it can be concluded that each clove origin has a unique taste and aroma. Meanwhile, dimensionality reduction using CMDS with Mahalanobis distance provides information that clove regions are from different origins but in the same cluster. These results provide information that there are differences in taste between clove origins. From these results, it can be concluded that dimensionality reduction using AWT and CMDS with Euclidean distance is appropriate before dimensionality reduction is carried out because it has almost the same distribution of origins in clusters. Furthermore, after the clove metabolite dataset has been clustered, the next step is the identification process if there is a metabolite dataset that has not been identified as a cluster. For the purpose of cluster identification, artificial neural networks (ANN) and k nearest neighbor (KNN) are proposed. In this study, the testing dataset represents metabolite datasets whose clusters have not been identified. The results obtained show that after dimensionality reduction using AWT and CMDS with Euclid distance, ANN and KNN have the same best performance in terms of accuracy, sensitivity, and specificity, however KNN is more efficient in terms of computational time. Meanwhile, after dimensionality reduction using CMDS with Mahalanobis distance, ANN outperformed KNN in terms of accuracy, sensitivity, and specificity, even though KNN was superior in terms of computational time.
format	Dissertations
author	Rustam
spellingShingle	Rustam DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
author_facet	Rustam
author_sort	Rustam
title	DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_short	DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_full	DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_fullStr	DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_full_unstemmed	DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA
title_sort	data dimensionality reduction for clustering and identification: a case study on indonesian clove buds metabolite data
url	https://digilib.itb.ac.id/gdl/view/68038
_version_	1822933522827968512

DATA DIMENSIONALITY REDUCTION FOR CLUSTERING AND IDENTIFICATION: A CASE STUDY ON INDONESIAN CLOVE BUDS METABOLITE DATA

Similar Items