PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)

This study aims to analyze the heavy metal content in well water across seven sub-districts in Bandung Regency. There are ten types of heavy metals present in 160 well points, namely lead (Pb), cobalt (Co), chromium (Cr), iron (Fe), manganese (Mn), cadmium (Cd), zinc (Zn), mercury (Hg), and arsenic...

Full description

Saved in:
Bibliographic Details
Main Author: Maulana, Devri
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/83734
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:83734
spelling id-itb.:837342024-08-12T16:02:17ZPRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY) Maulana, Devri Indonesia Theses Principal Component Analysis, Singular Value Decomposition, K-Means Cluster Analysis, Heavy Metals. INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/83734 This study aims to analyze the heavy metal content in well water across seven sub-districts in Bandung Regency. There are ten types of heavy metals present in 160 well points, namely lead (Pb), cobalt (Co), chromium (Cr), iron (Fe), manganese (Mn), cadmium (Cd), zinc (Zn), mercury (Hg), and arsenic (As), which are used as observational variables. The data used are multivariate data. The statistical method applied in this study is multivariate analysis. The case study data have high dimensions (ten dimensions), which can complicate the process of analysis, modeling, and visualization, as well as the variables of the data consisting of independent variables (ten types of heavy metals) without any response (dependent) variables, meaning there is no dependent variable that can be used to predict or explain the independent variables. To address this, multivariate analysis is necessary to identify patterns and structures within the data and to group the data. The multivariate analysis methods used are Principal Component Analysis (PCA) and cluster analysis. PCA is employed to reduce the dimensionality of the data characteristics and to identify the main components that most significantly influence the variance of heavy metal content in the well water. The PCA method uses Singular Value Decomposition (SVD), a decomposition method that breaks down a matrix into three other matrices. Additionally, PCA is used to detect multivariate outliers. Cluster analysis is employed to group the data into several clusters that share similar characteristics. The clustering method used is the K-Means algorithm, with the distance between objects and the centroid calculated using Euclidean distance. PCA assumes that the data must be multivariate normal and free from outliers. The data in this study do not meet the multivariate normality assumption and contain many outliers, so transformation was performed. The type of transformation used is the Box-Cox transformation. The transformation was conducted to normalize the data distribution and reduce the number of outliers. The K-Means cluster analysis on the initial data successfully grouped the well points into two different clusters based on their heavy metal content. Cluster 1 has relatively lower metal content compared to Cluster 2, particularly for Fe, Mn, and Cr. The PCA reduced the initial data from ten variables to six new variables (six principal components) with an information absorption rate of 86.87%. The application of K-Means on the PCA results showed different clustering based on the transformed values of Fe, Mn, and Cr. Cluster 1 has lower Fe, Mn, and Cr content after transformation. Both methods indicate that the content of Fe, Mn, and Cr are significant factors in the clustering of well points. The K-Means cluster analysis on the transformed data without multivariate outliers grouped the well points into two different clusters based on their heavy metal content. Cluster 1 has relatively lower metal content compared to Cluster 2, particularly for Cd and Pb. The PCA and K-Means methods provide an effective way to identify and group well points based on the characteristics of their heavy metal content, offering deeper insights into the distribution and variability of heavy metals in well water. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description This study aims to analyze the heavy metal content in well water across seven sub-districts in Bandung Regency. There are ten types of heavy metals present in 160 well points, namely lead (Pb), cobalt (Co), chromium (Cr), iron (Fe), manganese (Mn), cadmium (Cd), zinc (Zn), mercury (Hg), and arsenic (As), which are used as observational variables. The data used are multivariate data. The statistical method applied in this study is multivariate analysis. The case study data have high dimensions (ten dimensions), which can complicate the process of analysis, modeling, and visualization, as well as the variables of the data consisting of independent variables (ten types of heavy metals) without any response (dependent) variables, meaning there is no dependent variable that can be used to predict or explain the independent variables. To address this, multivariate analysis is necessary to identify patterns and structures within the data and to group the data. The multivariate analysis methods used are Principal Component Analysis (PCA) and cluster analysis. PCA is employed to reduce the dimensionality of the data characteristics and to identify the main components that most significantly influence the variance of heavy metal content in the well water. The PCA method uses Singular Value Decomposition (SVD), a decomposition method that breaks down a matrix into three other matrices. Additionally, PCA is used to detect multivariate outliers. Cluster analysis is employed to group the data into several clusters that share similar characteristics. The clustering method used is the K-Means algorithm, with the distance between objects and the centroid calculated using Euclidean distance. PCA assumes that the data must be multivariate normal and free from outliers. The data in this study do not meet the multivariate normality assumption and contain many outliers, so transformation was performed. The type of transformation used is the Box-Cox transformation. The transformation was conducted to normalize the data distribution and reduce the number of outliers. The K-Means cluster analysis on the initial data successfully grouped the well points into two different clusters based on their heavy metal content. Cluster 1 has relatively lower metal content compared to Cluster 2, particularly for Fe, Mn, and Cr. The PCA reduced the initial data from ten variables to six new variables (six principal components) with an information absorption rate of 86.87%. The application of K-Means on the PCA results showed different clustering based on the transformed values of Fe, Mn, and Cr. Cluster 1 has lower Fe, Mn, and Cr content after transformation. Both methods indicate that the content of Fe, Mn, and Cr are significant factors in the clustering of well points. The K-Means cluster analysis on the transformed data without multivariate outliers grouped the well points into two different clusters based on their heavy metal content. Cluster 1 has relatively lower metal content compared to Cluster 2, particularly for Cd and Pb. The PCA and K-Means methods provide an effective way to identify and group well points based on the characteristics of their heavy metal content, offering deeper insights into the distribution and variability of heavy metals in well water.
format Theses
author Maulana, Devri
spellingShingle Maulana, Devri
PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)
author_facet Maulana, Devri
author_sort Maulana, Devri
title PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)
title_short PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)
title_full PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)
title_fullStr PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)
title_full_unstemmed PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)
title_sort principal component analysis and k-means cluster analysis on well water data containing heavy metals (case study: heavy metal content data in well water in bandung regency)
url https://digilib.itb.ac.id/gdl/view/83734
_version_ 1822998244549984256