Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd

Classification is a data mining technique used to classify varied data types according to a specific criterion. One of the most powerful machine learning methods to handle classification problems is the decision tree. There are various decision tree algorithms, but the most commonly used are Iterati...

Full description

Saved in:
Bibliographic Details
Main Author: Nur Farahaina, Idris
Format: Thesis
Language:English
Published: 2022
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/37640/1/ir.Classification%20of%20breast%20cancer%20disease%20using%20bagging%20fuzzy-id3%20algorithm%20based%20on%20fuzzydbd.pdf
http://umpir.ump.edu.my/id/eprint/37640/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaysia Pahang
Language: English
id my.ump.umpir.37640
record_format eprints
spelling my.ump.umpir.376402023-09-15T08:10:42Z http://umpir.ump.edu.my/id/eprint/37640/ Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd Nur Farahaina, Idris Q Science (General) QA75 Electronic computers. Computer science Classification is a data mining technique used to classify varied data types according to a specific criterion. One of the most powerful machine learning methods to handle classification problems is the decision tree. There are various decision tree algorithms, but the most commonly used are Iterative Dichotomiser 3 (ID3), CART, and C4.5. ID3 has the most advantages among the three algorithms, especially in processing time, as it builds the fastest tree with short depth. However, despite the decision tree’s commonness in handling classification problems, it suffers problems like high variance and overfitting, leading to poor generalisation. The combination of fuzzy and ID3 algorithm manages the data more efficiently as it combines both the advantages of fuzzy and decision tree. For the proposed technique of the FID3-DBD algorithm, the continuous and discrete (integer) attributes would be defined in the linguistic values of the fuzzy sets, and the FUZZYDBD method is being used to set up the fuzzy sets’ parameters. Replacement with the linguistic labels of fuzzy sets with the highest compatibility of input values has also been done before the tree induction occurs. The proposed technique solves the limitation of the classic ID3 algorithm that cannot classify the continuous-valued attributes and, at the same time, increase the classification accuracy. The bagging method was then applied to the FID3-DBD algorithm to overcome overfitting problems and high variance in decision trees. Four breast cancer datasets were used to evaluate the classification accuracy: Wisconsin Breast Cancer (Original) dataset, WDBC (Diagnostic) dataset, Breast Cancer Coimbra dataset, and Mammographic Mass dataset. All those datasets were acquired from the UCI machine learning repository. This study aims to solve the limitation of the classic ID3 algorithm that is unable to classify continuous data well and overcome the high variance and overfitting issues. This research methodology consists of four fundamental steps: literature review, data collection, experiment implementation, and report writing. The FID3-DBD algorithm acquired the classification accuracy of 94.362% for the Wisconsin Breast Cancer (Original) dataset, 94.358% for the WDBC (Diagnostic) dataset, 81.119% for the Mammographic Mass dataset and 64.224% for the Coimbra dataset. The BFID3-DBD algorithm obtained the classification accuracy of 96.003% for the Wisconsin Breast Cancer (Original) dataset, 95.273% for the WDBC (Diagnostic) dataset, 81.590% for the Mammographic Mass dataset and 68.966% for the Coimbra dataset. The study verified that the FID3-DBD algorithm could classify the continuous data, and the BFID3-DBD algorithm overcame the overfitting issue, reduced high variance, and increased test data classification accuracy. 2022-02 Thesis NonPeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/37640/1/ir.Classification%20of%20breast%20cancer%20disease%20using%20bagging%20fuzzy-id3%20algorithm%20based%20on%20fuzzydbd.pdf Nur Farahaina, Idris (2022) Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd. Masters thesis, Universiti Malaysia Pahang (Contributors, Thesis advisor: Mohd Arfian, Ismail).
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
topic Q Science (General)
QA75 Electronic computers. Computer science
spellingShingle Q Science (General)
QA75 Electronic computers. Computer science
Nur Farahaina, Idris
Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
description Classification is a data mining technique used to classify varied data types according to a specific criterion. One of the most powerful machine learning methods to handle classification problems is the decision tree. There are various decision tree algorithms, but the most commonly used are Iterative Dichotomiser 3 (ID3), CART, and C4.5. ID3 has the most advantages among the three algorithms, especially in processing time, as it builds the fastest tree with short depth. However, despite the decision tree’s commonness in handling classification problems, it suffers problems like high variance and overfitting, leading to poor generalisation. The combination of fuzzy and ID3 algorithm manages the data more efficiently as it combines both the advantages of fuzzy and decision tree. For the proposed technique of the FID3-DBD algorithm, the continuous and discrete (integer) attributes would be defined in the linguistic values of the fuzzy sets, and the FUZZYDBD method is being used to set up the fuzzy sets’ parameters. Replacement with the linguistic labels of fuzzy sets with the highest compatibility of input values has also been done before the tree induction occurs. The proposed technique solves the limitation of the classic ID3 algorithm that cannot classify the continuous-valued attributes and, at the same time, increase the classification accuracy. The bagging method was then applied to the FID3-DBD algorithm to overcome overfitting problems and high variance in decision trees. Four breast cancer datasets were used to evaluate the classification accuracy: Wisconsin Breast Cancer (Original) dataset, WDBC (Diagnostic) dataset, Breast Cancer Coimbra dataset, and Mammographic Mass dataset. All those datasets were acquired from the UCI machine learning repository. This study aims to solve the limitation of the classic ID3 algorithm that is unable to classify continuous data well and overcome the high variance and overfitting issues. This research methodology consists of four fundamental steps: literature review, data collection, experiment implementation, and report writing. The FID3-DBD algorithm acquired the classification accuracy of 94.362% for the Wisconsin Breast Cancer (Original) dataset, 94.358% for the WDBC (Diagnostic) dataset, 81.119% for the Mammographic Mass dataset and 64.224% for the Coimbra dataset. The BFID3-DBD algorithm obtained the classification accuracy of 96.003% for the Wisconsin Breast Cancer (Original) dataset, 95.273% for the WDBC (Diagnostic) dataset, 81.590% for the Mammographic Mass dataset and 68.966% for the Coimbra dataset. The study verified that the FID3-DBD algorithm could classify the continuous data, and the BFID3-DBD algorithm overcame the overfitting issue, reduced high variance, and increased test data classification accuracy.
format Thesis
author Nur Farahaina, Idris
author_facet Nur Farahaina, Idris
author_sort Nur Farahaina, Idris
title Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_short Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_full Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_fullStr Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_full_unstemmed Classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
title_sort classification of breast cancer disease using bagging fuzzy-id3 algorithm based on fuzzydbd
publishDate 2022
url http://umpir.ump.edu.my/id/eprint/37640/1/ir.Classification%20of%20breast%20cancer%20disease%20using%20bagging%20fuzzy-id3%20algorithm%20based%20on%20fuzzydbd.pdf
http://umpir.ump.edu.my/id/eprint/37640/
_version_ 1778161077850210304