A stacked ensemble deep learning model for water quality prediction / Wong Wen Yee
Water quality management is crucial to ensure water security for the sustainability of health, productivity, and livelihoods. Contamination of water sources often occurs due to illegal waste dumping, sewage, and industrial discharge. This causes hazardous substances such as pesticides, heavy metals,...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2023
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/15108/2/Wong_Wen_Yee.pdf http://studentsrepo.um.edu.my/15108/1/Wong_Wen_Yee.pdf http://studentsrepo.um.edu.my/15108/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaya |
Summary: | Water quality management is crucial to ensure water security for the sustainability of health, productivity, and livelihoods. Contamination of water sources often occurs due to illegal waste dumping, sewage, and industrial discharge. This causes hazardous substances such as pesticides, heavy metals, and pathogens to seep into waterways. Therefore, the use of water quality indicators to detect the presence of pollutants is very important. Conventional water quality index (WQI) assessment methods are limited to features such as water acidity or basicity (pH), dissolved oxygen (DO), biological oxygen demand (BOD), chemical oxygen demand (COD), ammoniacal nitrogen (NH3N), and suspended solids (SS). These features are too common and insufficient to represent the true nature of water quality. Other significant parameters including fecal coliform, heavy metals, and nutrients were not part of the WQI formula. Hence, this study aims to bridge the research gap of using different water quality parameters in water quality assessment through artificial intelligence. In this work, the potential of other water quality parameters as input variables is investigated and discussed. There are 17 input features, namely conductivity (COND), salinity (SAL), turbidity (TUR), dissolved solids (DS), nitrate (NO3), chloride (Cl), phosphate (PO4), arsenic (As), chromium (Cr), zinc (Zn), calcium (Ca), iron (Fe), potassium (K), magnesium (Mg), sodium (Na), E. coli, and total coliform, analyzed using five regression algorithms: random forest (RF), AdaBoost, support vector regression (SVR), decision tree regression (DTR), and multilayer perceptron (MLP) for preliminary model selection. The results show that the RF algorithm exhibits better prediction performance, with R2 of 0.798. The dataset is then validated with the RF classifier, and results are then improved by applying the synthetic minority oversampling technique (SMOTE) to tackle imbalanced datasets. The proposed method is shown to achieve 78.13%, 72.99%, 63.51%, and 66.85% accuracy, precision, recall, and F1 score, respectively. The results and analysis obtained from this study have proven the possibility of predicting WQI using other input features. In addition, the research extended its study to understanding imbalanced data in water quality datasets. Classifiers often perform poorly in skewed data due to a bias in the majority class. Therefore, this paper aims to explore the use of ensemble and deep learning techniques to simplify the classification process of imbalanced data. The study then proposes a stacked ensemble deep learning framework for a faster and more efficient water quality analysis. The stacked ensemble deep learning method applied was proven robust with a performance accuracy, precision, recall, and F1 score at 95.69%, 94.96%, 92.92%, and 93.88% respectively. The proposed deep learning model renders faster without the use of SMOTE. Any resampling algorithm is not a necessity in the case of this proposed algorithm.
|
---|