Improving Classification Accuracy of Scikit-learn Classifiers with Discrete Fuzzy Interval Values
Understanding machine learning (ML) algorithm from scratch is time consuming. Thus, many software and library packages such as Weka and Scikit-Learn have been introduced to help researchers run simulation on several amounts of well-known classifiers. In ML, different classifiers have different perfo...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference or Workshop Item |
Published: |
Institute of Electrical and Electronics Engineers Inc.
2020
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85097524594&doi=10.1109%2fICCI51257.2020.9247696&partnerID=40&md5=80d1e8e4c3f9a70d31ab39ce1de3c105 http://eprints.utp.edu.my/29869/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Petronas |
Summary: | Understanding machine learning (ML) algorithm from scratch is time consuming. Thus, many software and library packages such as Weka and Scikit-Learn have been introduced to help researchers run simulation on several amounts of well-known classifiers. In ML, different classifiers have different performance and this depends on factor such as type of data used as input for the classification phase. Thus, it is necessary to perform data discretization when dealing with continuous data for classifiers that perform better with discrete data. However, in data mining, depending solely on discretization is not enough as real-world data can be large, imprecise and noisy. In addition, knowledge representation is necessary to help researchers to understand better about the data during the discretization process. Thus, the objective of this study is to observe the effect of fuzzy elements inside the discretization phase on the classification accuracy of Scikit-learn classifiers. In this study, fuzzy logic has been proposed to assist the existing discretization technique through fuzzy membership graph, linguistic variables and discrete interval values. All classifiers in Scikit-learn packages were used during the classification phase through 10-fold cross validation. The simulation results showed that the presence of fuzzy in assisting the discretization process slightly improved the classification accuracy of ensemble type classifiers such as Random Forest and Naive Bayes while slightly degrading the performance of other classifiers. © 2020 IEEE. |
---|