Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia

The use of predictive model is useful for preventing and controlling disease out-break. This can be done by analysing weather behavior in relation to disease occurrence. In Malaysia, leptospirosis disease is the one of the higher number of cases that reported for past 7 years, and the absence of und...

Full description

Saved in:
Bibliographic Details
Main Author: Rahmat, Mohamad Fariq
Format: Thesis
Language:English
Published: 2019
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/104249/1/MOHAMAD%20FARIQ%20BIN%20RAHMAT%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/104249/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Putra Malaysia
Language: English
id my.upm.eprints.104249
record_format eprints
spelling my.upm.eprints.1042492023-07-25T01:58:50Z http://psasir.upm.edu.my/id/eprint/104249/ Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia Rahmat, Mohamad Fariq The use of predictive model is useful for preventing and controlling disease out-break. This can be done by analysing weather behavior in relation to disease occurrence. In Malaysia, leptospirosis disease is the one of the higher number of cases that reported for past 7 years, and the absence of understanding and modelling studies that allows development of an early warning system. In this study, predictive model is developed using machine learning to capture the relation between weather variables such as temperature, sum of rainfall, and relative humidity, and Leptospira occurrence. The aim of this study is to predict the occurrence of Leptospirosis in Seremban district using a machine learning and meteorological data as input. The first objective of the study is to investigate the best time lags for each weather variable using feature selection methods. The second objective is to develop, train and test a neural network model for disease prediction based on the selected features. Feature selection was conducted using two methods: firstly, though correlation analysis, and secondly through graphical and non-graphical Exploratory Data Analysis (EDA). The neural network model is developed using Backpropagation training, optimizing the number of hidden layers and hidden nodes. The success is measured using accuracy, sensitivity, and specificity of the model. Correlation analysis has shown that Seremban district has higher correlation with disease occurrence when sum of rainfall at lag 4 until 16 weeks and temperature at lag 1 week, while by using EDA has shown Seremban can have high correlation with leptospirosis occurrence when the temperature at lag 16 weeks and sum of rainfall at lag 12 until 20 weeks. This study also shown the predictive model can achieve high accuracy between 80% to 84% when the input variables were following the feature selection that have been made by EDA and the number of hidden neurons is 10. In conclusion, this study is able to show the trend of the environmental variable in predicting the leptospirosis occurrence at different time lag. Besides, by having this predictive model, it helps the public health not only to predict the occurrence of the disease, but it can prevent from the outbreak start to spread to the community by giving the early warning based on the weather status in future. 2019-11 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/104249/1/MOHAMAD%20FARIQ%20BIN%20RAHMAT%20-%20IR.pdf Rahmat, Mohamad Fariq (2019) Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia. Masters thesis, Universiti Putra Malaysia. Imaging systems in meteorology Meteorological instruments - Malaysia Leptospirosis
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
topic Imaging systems in meteorology
Meteorological instruments - Malaysia
Leptospirosis
spellingShingle Imaging systems in meteorology
Meteorological instruments - Malaysia
Leptospirosis
Rahmat, Mohamad Fariq
Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
description The use of predictive model is useful for preventing and controlling disease out-break. This can be done by analysing weather behavior in relation to disease occurrence. In Malaysia, leptospirosis disease is the one of the higher number of cases that reported for past 7 years, and the absence of understanding and modelling studies that allows development of an early warning system. In this study, predictive model is developed using machine learning to capture the relation between weather variables such as temperature, sum of rainfall, and relative humidity, and Leptospira occurrence. The aim of this study is to predict the occurrence of Leptospirosis in Seremban district using a machine learning and meteorological data as input. The first objective of the study is to investigate the best time lags for each weather variable using feature selection methods. The second objective is to develop, train and test a neural network model for disease prediction based on the selected features. Feature selection was conducted using two methods: firstly, though correlation analysis, and secondly through graphical and non-graphical Exploratory Data Analysis (EDA). The neural network model is developed using Backpropagation training, optimizing the number of hidden layers and hidden nodes. The success is measured using accuracy, sensitivity, and specificity of the model. Correlation analysis has shown that Seremban district has higher correlation with disease occurrence when sum of rainfall at lag 4 until 16 weeks and temperature at lag 1 week, while by using EDA has shown Seremban can have high correlation with leptospirosis occurrence when the temperature at lag 16 weeks and sum of rainfall at lag 12 until 20 weeks. This study also shown the predictive model can achieve high accuracy between 80% to 84% when the input variables were following the feature selection that have been made by EDA and the number of hidden neurons is 10. In conclusion, this study is able to show the trend of the environmental variable in predicting the leptospirosis occurrence at different time lag. Besides, by having this predictive model, it helps the public health not only to predict the occurrence of the disease, but it can prevent from the outbreak start to spread to the community by giving the early warning based on the weather status in future.
format Thesis
author Rahmat, Mohamad Fariq
author_facet Rahmat, Mohamad Fariq
author_sort Rahmat, Mohamad Fariq
title Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_short Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_full Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_fullStr Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_full_unstemmed Feature selection methods based on meteorological data for prediction of leptospirosis occurrence in Seremban, Malaysia
title_sort feature selection methods based on meteorological data for prediction of leptospirosis occurrence in seremban, malaysia
publishDate 2019
url http://psasir.upm.edu.my/id/eprint/104249/1/MOHAMAD%20FARIQ%20BIN%20RAHMAT%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/104249/
_version_ 1772813454228324352