A comparative study between rough and decision tree classifiers

Rule-based classification system (RBC) has been widely used in many real world applications because of the easy interpretability of rules.RBC mines a collection of rule via knowledge which is hidden in dataset in order to accurately map new cases to the decision class.In the real world, the number o...

Full description

Saved in:
Bibliographic Details
Main Author: Mohamad Mohsin, Mohamad Farhan
Format: Monograph
Language:English
English
Published: Universiti Utara Malaysia 2008
Subjects:
Online Access:http://repo.uum.edu.my/7807/1/fAR.pdf
http://repo.uum.edu.my/7807/3/1.Mohamad%20Farhan.pdf
http://repo.uum.edu.my/7807/
http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000301019
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Utara Malaysia
Language: English
English
id my.uum.repo.7807
record_format eprints
spelling my.uum.repo.78072014-07-08T01:16:21Z http://repo.uum.edu.my/7807/ A comparative study between rough and decision tree classifiers Mohamad Mohsin, Mohamad Farhan H Social Sciences (General) Rule-based classification system (RBC) has been widely used in many real world applications because of the easy interpretability of rules.RBC mines a collection of rule via knowledge which is hidden in dataset in order to accurately map new cases to the decision class.In the real world, the number of attribute of dataset could be very large due the capability of database technology to store much information.Following that, the large dataset may contain thousands of relationship and it will likely provide more knowledge since the interrelationship between data will give more description.Furthermore, it is also have the possibility to have most number of rules that contain unnecessary rule or redundancies in the model. Theoretically, a good set of knowledge should provide good accuracy when dealing with new cases.Besides accuracy, a good rule set must also has a minimum number of rules and each rule should be short as possible.It is often that a rule set contains smaller quantity of rules but they usually have more conditions.An ideal model should be able to produces fewer, shorter rule and classify new data with good accuracy.Consequently, the quality and compact knowledge will contribute manager with a good decision model.Because of that, the search for appropriate data mining approach which can provide quality knowledge is important.Rough classifier (RC) and decision tree classifier (DTC) are categorized as RBC.The purpose of this study is to investigate the capability of RC and DTC in generating quality knowledge which leads to the good accuracy.To achieve that, both classifiers are compared based on four measurements that are accuracy of the classification, the number of rule, the length of rule, and the coverage of rule.Five dataset from UCI Machine Learning namely United States Congressional Voting Records, Credit Approval, Wisconsin Diagnostic Breast Cancer, Pima Indians Diabetes Database, and Vehicle Silhouettes are chosen as data experiment.All datasets were mined using RC toolkit namely ROSETTA while C4.5 algorithm in WEKA application was chosen as DTC rule generator.The experimental results indicated that both classifiers produced good classification result and had generated quality rule in different types of model – higher accuracy, fewer rule, shorter rule, and higher coverage.In term of accuracy, RC obtained higher accuracy in average while DTC significantly generated lower number of rule than RC.In term of rule length, RC produced compact and shorter rule than DTC and the length is not significantly different.Meanwhile, RC has better coverage than DTC.Final conclusion can be decided as follows “If the user interested at a variety of rule pattern with a good accuracy and the number of rule is not important, RC is the best solution whereas if the user looks for fewer nr, DTC might be the best choice” Universiti Utara Malaysia 2008 Monograph NonPeerReviewed application/pdf en http://repo.uum.edu.my/7807/1/fAR.pdf application/pdf en http://repo.uum.edu.my/7807/3/1.Mohamad%20Farhan.pdf Mohamad Mohsin, Mohamad Farhan (2008) A comparative study between rough and decision tree classifiers. Project Report. Universiti Utara Malaysia. (Unpublished) http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000301019
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutionali Repository
url_provider http://repo.uum.edu.my/
language English
English
topic H Social Sciences (General)
spellingShingle H Social Sciences (General)
Mohamad Mohsin, Mohamad Farhan
A comparative study between rough and decision tree classifiers
description Rule-based classification system (RBC) has been widely used in many real world applications because of the easy interpretability of rules.RBC mines a collection of rule via knowledge which is hidden in dataset in order to accurately map new cases to the decision class.In the real world, the number of attribute of dataset could be very large due the capability of database technology to store much information.Following that, the large dataset may contain thousands of relationship and it will likely provide more knowledge since the interrelationship between data will give more description.Furthermore, it is also have the possibility to have most number of rules that contain unnecessary rule or redundancies in the model. Theoretically, a good set of knowledge should provide good accuracy when dealing with new cases.Besides accuracy, a good rule set must also has a minimum number of rules and each rule should be short as possible.It is often that a rule set contains smaller quantity of rules but they usually have more conditions.An ideal model should be able to produces fewer, shorter rule and classify new data with good accuracy.Consequently, the quality and compact knowledge will contribute manager with a good decision model.Because of that, the search for appropriate data mining approach which can provide quality knowledge is important.Rough classifier (RC) and decision tree classifier (DTC) are categorized as RBC.The purpose of this study is to investigate the capability of RC and DTC in generating quality knowledge which leads to the good accuracy.To achieve that, both classifiers are compared based on four measurements that are accuracy of the classification, the number of rule, the length of rule, and the coverage of rule.Five dataset from UCI Machine Learning namely United States Congressional Voting Records, Credit Approval, Wisconsin Diagnostic Breast Cancer, Pima Indians Diabetes Database, and Vehicle Silhouettes are chosen as data experiment.All datasets were mined using RC toolkit namely ROSETTA while C4.5 algorithm in WEKA application was chosen as DTC rule generator.The experimental results indicated that both classifiers produced good classification result and had generated quality rule in different types of model – higher accuracy, fewer rule, shorter rule, and higher coverage.In term of accuracy, RC obtained higher accuracy in average while DTC significantly generated lower number of rule than RC.In term of rule length, RC produced compact and shorter rule than DTC and the length is not significantly different.Meanwhile, RC has better coverage than DTC.Final conclusion can be decided as follows “If the user interested at a variety of rule pattern with a good accuracy and the number of rule is not important, RC is the best solution whereas if the user looks for fewer nr, DTC might be the best choice”
format Monograph
author Mohamad Mohsin, Mohamad Farhan
author_facet Mohamad Mohsin, Mohamad Farhan
author_sort Mohamad Mohsin, Mohamad Farhan
title A comparative study between rough and decision tree classifiers
title_short A comparative study between rough and decision tree classifiers
title_full A comparative study between rough and decision tree classifiers
title_fullStr A comparative study between rough and decision tree classifiers
title_full_unstemmed A comparative study between rough and decision tree classifiers
title_sort comparative study between rough and decision tree classifiers
publisher Universiti Utara Malaysia
publishDate 2008
url http://repo.uum.edu.my/7807/1/fAR.pdf
http://repo.uum.edu.my/7807/3/1.Mohamad%20Farhan.pdf
http://repo.uum.edu.my/7807/
http://lintas.uum.edu.my:8080/elmu/index.jsp?module=webopac-l&action=fullDisplayRetriever.jsp&szMaterialNo=0000301019
_version_ 1644279647597232128