Comparison of different variable selection methods for predicting the occurrence of Metisa Plana in oil palm plantation using machine learning

Monitoring and predicting the spatio-temporal distribution of crop pests and assessing related risks are crucial for effective pest management strategies. Machine learning techniques have shown potential in analysing agricultural data and providing accurate predictions. Variable selection plays a cr...

Full description

Saved in:
Bibliographic Details
Main Authors: Wang, Y. P., Idris, Nurul Hawani, Muharam, Farrah Melissa, Asib, Norhayu, Lau, Alvin Meng Shin
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:http://eprints.utm.my/107735/1/NurulHawaniIdris2023_ComparisonofDifferentVariableSelection.pdf
http://eprints.utm.my/107735/
http://dx.doi.org/10.1088/1755-1315/1274/1/012008
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
Description
Summary:Monitoring and predicting the spatio-temporal distribution of crop pests and assessing related risks are crucial for effective pest management strategies. Machine learning techniques have shown potential in analysing agricultural data and providing accurate predictions. Variable selection plays a critical role in crop pest analysis by identifying the most informative and influential features that contribute to pest distribution and risk prediction. The current practice of choosing variable selection methods is mostly based on previous experience and may involve a certain degree of subjectivity. This paper aims to provide empirical comparisons of different variable selection methods for machine learning applications in crop pest spatio-temporal distribution and risk prediction. This study conducted various variable selection methods, including filter methods (information gain, chi-square test, mutual information), wrapper methods (RFE), and embedded methods (Random Forest), using worms pest (Metisa plana) in oil palm trees as the experimental subject. The initial set of variables included bioclimatic, vegetation indices, and terrain variables. The experimental results indicated that there was some overlap in the selected variables across different methods, bioclimatic variables (rainfall (RF), relative humidity (RH)) were selected as important variables by different methods; non-important variables like NDVI and elevation when added to the ANN modelling can clearly contribute to the improvement in prediction accuracy. These empirical findings can provide guidance for relevant data monitoring in the prediction of crop pest and disease outbreaks. Additionally, the results can serve as a reference for variable selection in spatiotemporal prediction of pests and diseases in other agricultural and forestry crops.