An improved algorithm for iris classification by using support vector machine and binary random machine learning
In machine learning, there are three type of learning branch that can used in classification procedures for data mining. Those branch are consist of supervised learning, unsupervised learning and reinforcement learning. This study focuses on supervised learning that seek to classify all the Iris dat...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English English English |
Published: |
2018
|
Subjects: | |
Online Access: | http://eprints.uthm.edu.my/295/1/24p%20ahmad%20haadzal%20kamarulzalis.pdf http://eprints.uthm.edu.my/295/2/AHMAD%20HAADZAL%20KAMARULZALIS%20COPYRIGHT%20DECLARATION.pdf http://eprints.uthm.edu.my/295/3/AHMAD%20HAADZAL%20KAMARULZALIS%20WATERMARK.pdf http://eprints.uthm.edu.my/295/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Tun Hussein Onn Malaysia |
Language: | English English English |
Summary: | In machine learning, there are three type of learning branch that can used in classification procedures for data mining. Those branch are consist of supervised learning, unsupervised learning and reinforcement learning. This study focuses on supervised learning that seek to classify all the Iris dataset respect to three species (setosa, versicolor and virginica) in order them to mimic the actual dataset by using Support Vector Machine with four different kernel function (Linear, Radial Basis, Sigmoid and Polynomial), Random Forest (RF), k-Nearest Neighbors(k-NN) and Random Nearest Neighbors (RNN) as a method. The first objective of this study is to improve a new algorithm technique for classification. The new algorithm come from a combination of an ideas of k-NN algorithm and ensemble concept. The second objective is to conduct a supervised and binary ensemble machine learning technique for classification. This is done by using method of RF and RNN that share the same ensemble concept. The last objective is to identify the best model for classification procedures. Performance Measurement Tools such as overall accuracy, kappa, average sensitivity, average specificity, average precious, average detection rate, average prevalence and misclassification error rate (MER) were used by refers confusion matrix values output during data analysis for average and individual performance of each classifier. Besides that, Performance Visualization such as Stacked Bar Plot, Fourfold Plot, Receiver Operating Characteristic (ROC) Curve and Lollipop Chart are used to simplify each output for more clear understanding. Random Nearest Neighbors (RNN) has highest accuracy value that is 98.67% and just 1.33% misclassification error rate (MER) compare to other classifier. Therefore, Random Nearest Neighbors (RNN) is preferable for supervised learning classification procedures. |
---|