PRINCIPAL COMPONENT ANALYSIS AND K-MEANS CLUSTER ANALYSIS ON WELL WATER DATA CONTAINING HEAVY METALS (CASE STUDY: HEAVY METAL CONTENT DATA IN WELL WATER IN BANDUNG REGENCY)
This study aims to analyze the heavy metal content in well water across seven sub-districts in Bandung Regency. There are ten types of heavy metals present in 160 well points, namely lead (Pb), cobalt (Co), chromium (Cr), iron (Fe), manganese (Mn), cadmium (Cd), zinc (Zn), mercury (Hg), and arsenic...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/83734 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | This study aims to analyze the heavy metal content in well water across seven sub-districts in Bandung Regency. There are ten types of heavy metals present in 160 well points, namely lead (Pb), cobalt (Co), chromium (Cr), iron (Fe), manganese (Mn), cadmium (Cd), zinc (Zn), mercury (Hg), and arsenic (As), which are used as observational variables. The data used are multivariate data. The statistical method applied in this study is multivariate analysis. The case study data have high dimensions (ten dimensions), which can complicate the process of analysis, modeling, and visualization, as well as the variables of the data consisting of independent variables (ten types of heavy metals) without any response (dependent) variables, meaning there is no dependent variable that can be used to predict or explain the independent variables. To address this, multivariate analysis is necessary to identify patterns and structures within the data and to group the data. The multivariate analysis methods used are Principal Component Analysis (PCA) and cluster analysis. PCA is employed to reduce the dimensionality of the data characteristics and to identify the main components that most significantly influence the variance of heavy metal content in the well water. The PCA method uses Singular Value Decomposition (SVD), a decomposition method that breaks down a matrix into three other matrices. Additionally, PCA is used to detect multivariate outliers. Cluster analysis is employed to group the data into several clusters that share similar characteristics. The clustering method used is the K-Means algorithm, with the distance between objects and the centroid calculated using Euclidean distance. PCA assumes that the data must be multivariate normal and free from outliers. The data in this study do not meet the multivariate normality assumption and contain many outliers, so transformation was performed. The type of transformation used is the Box-Cox transformation. The transformation was conducted to normalize the data distribution and reduce the number of outliers. The K-Means cluster analysis on the initial data successfully grouped the well points into two different clusters based on their heavy metal content. Cluster 1 has relatively lower metal content compared to Cluster 2, particularly for Fe, Mn, and Cr. The PCA reduced the initial data from ten variables to six new variables (six principal components) with an information absorption rate of 86.87%. The application of K-Means on the PCA results showed different clustering based on the transformed values of Fe, Mn, and Cr. Cluster 1 has lower Fe, Mn, and Cr content after transformation. Both methods indicate that the content of Fe, Mn, and Cr are significant factors in the clustering of well points. The K-Means cluster analysis on the transformed data without multivariate outliers grouped the well points into two different clusters based on their heavy metal content. Cluster 1 has relatively lower metal content compared to Cluster 2, particularly for Cd and Pb. The PCA and K-Means methods provide an effective way to identify and group well points based on the characteristics of their heavy metal content, offering deeper insights into the distribution and variability of heavy metals in well water. |
---|