Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm
The number of data will grow rapidly and showed a significant increase every day. This data comes from different resources and services that produce a big volume of data that need to manage and reuse or some analytical aspects of the data. These heterogeneous sources of information are able to lead...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Published: |
International Journal of Scientific and Technology Research
2019
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/90847/ http://www.ijstr.org/final-print/oct2019/Machine-Learning-Technique-For-Enhancing-Classification-Performance-In-Data-Summarization-Using-Rough-Set-And-Genetic-Algorithm.pdf. |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Teknologi Malaysia |
id |
my.utm.90847 |
---|---|
record_format |
eprints |
spelling |
my.utm.908472021-05-31T13:20:53Z http://eprints.utm.my/id/eprint/90847/ Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm Wibowo, M. Noviyanto, F. QA75 Electronic computers. Computer science The number of data will grow rapidly and showed a significant increase every day. This data comes from different resources and services that produce a big volume of data that need to manage and reuse or some analytical aspects of the data. These heterogeneous sources of information are able to lead to important challenges for calibration of the model, as the data is often possible to be imprecise, uncertain, ambiguous, and incomplete. Therefore, it needs big storages and this volume of makes operations such as analytical operations, process operations, retrieval operations real difficult and hugely time-consuming. One of the solutions to overcome these difficult problems is to have data summarized to make less storage and extremely shorter time to get processed and retrieved. Data summarization techniques aim than to produce the best quality of summaries. In this study, Rough Set (RS) is proposed to obtain the accuracy, effectiveness and appropriate summary result. However, RS can extract decision rules effectively from given datasets, two processes data discretization and finding reducts are required in order to generate decision rules based on the values. Both processes are known to be Non-Polynomials (NP) problem and are also related to the dimensionality reduction problem. To solve two problems, Genetic Algorithm (GA) is applied to search both the cut points for discretization and the reducts in order to discover the optimal rules. Moreover, the reduction and transformation of the data may shorten the running time, while also allowing the system to obtain more generalized results and improve the predictive accuracy. Therefore, this study proposes the hybrid approach of RS and GA to improve lack of the rough set to ensure of better result. Hybridization of the proposed method hybrid RS-GA is going to overcome the short come of data summarization method. In order to find the efficiency of the proposed work, the classification accuracy obtained using these methods are compared with the accuracy of the proposed hybrid approach. The ML methods were analyzed by comparing the prediction accuracy: Rough Set (RS), NaÏve Bayes (NB), J48, Random Tree (RT) and Projective Adaptive Resonance Theory (PART). The finding shows that RS-GA approach achieved the highest prediction accuracy with 99.95% and produce the lowest error based on API values from Malaysia and Singapore respectively compared to the other ML methods. For that, it was proved that RS-GA is the best performance and the most significant method compared to other methods. International Journal of Scientific and Technology Research 2019-10 Article PeerReviewed Wibowo, M. and Noviyanto, F. (2019) Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm. International Journal of Scientific and Technology Research, 8 (10). pp. 1108-1119. ISSN 2277-8616 http://www.ijstr.org/final-print/oct2019/Machine-Learning-Technique-For-Enhancing-Classification-Performance-In-Data-Summarization-Using-Rough-Set-And-Genetic-Algorithm.pdf. |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Wibowo, M. Noviyanto, F. Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm |
description |
The number of data will grow rapidly and showed a significant increase every day. This data comes from different resources and services that produce a big volume of data that need to manage and reuse or some analytical aspects of the data. These heterogeneous sources of information are able to lead to important challenges for calibration of the model, as the data is often possible to be imprecise, uncertain, ambiguous, and incomplete. Therefore, it needs big storages and this volume of makes operations such as analytical operations, process operations, retrieval operations real difficult and hugely time-consuming. One of the solutions to overcome these difficult problems is to have data summarized to make less storage and extremely shorter time to get processed and retrieved. Data summarization techniques aim than to produce the best quality of summaries. In this study, Rough Set (RS) is proposed to obtain the accuracy, effectiveness and appropriate summary result. However, RS can extract decision rules effectively from given datasets, two processes data discretization and finding reducts are required in order to generate decision rules based on the values. Both processes are known to be Non-Polynomials (NP) problem and are also related to the dimensionality reduction problem. To solve two problems, Genetic Algorithm (GA) is applied to search both the cut points for discretization and the reducts in order to discover the optimal rules. Moreover, the reduction and transformation of the data may shorten the running time, while also allowing the system to obtain more generalized results and improve the predictive accuracy. Therefore, this study proposes the hybrid approach of RS and GA to improve lack of the rough set to ensure of better result. Hybridization of the proposed method hybrid RS-GA is going to overcome the short come of data summarization method. In order to find the efficiency of the proposed work, the classification accuracy obtained using these methods are compared with the accuracy of the proposed hybrid approach. The ML methods were analyzed by comparing the prediction accuracy: Rough Set (RS), NaÏve Bayes (NB), J48, Random Tree (RT) and Projective Adaptive Resonance Theory (PART). The finding shows that RS-GA approach achieved the highest prediction accuracy with 99.95% and produce the lowest error based on API values from Malaysia and Singapore respectively compared to the other ML methods. For that, it was proved that RS-GA is the best performance and the most significant method compared to other methods. |
format |
Article |
author |
Wibowo, M. Noviyanto, F. |
author_facet |
Wibowo, M. Noviyanto, F. |
author_sort |
Wibowo, M. |
title |
Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm |
title_short |
Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm |
title_full |
Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm |
title_fullStr |
Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm |
title_full_unstemmed |
Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm |
title_sort |
machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm |
publisher |
International Journal of Scientific and Technology Research |
publishDate |
2019 |
url |
http://eprints.utm.my/id/eprint/90847/ http://www.ijstr.org/final-print/oct2019/Machine-Learning-Technique-For-Enhancing-Classification-Performance-In-Data-Summarization-Using-Rough-Set-And-Genetic-Algorithm.pdf. |
_version_ |
1702169610151788544 |