Training set size reduction in large dataset problems

© 2015 IEEE. Classifiers have known to be used in various fields of applications. However, the main problem usually found recently is about applying a classifier to large datasets. Thus, the process of reducing size of the training set becomes necessary especially to accelerate the processing time o...

Full description

Saved in:
Bibliographic Details
Main Authors: Varin Chouvatut, Wattana Jindaluang, Ekkarat Boonchieng
Format: Conference Proceeding
Published: 2018
Subjects:
Online Access:https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84964320834&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/55533
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
id th-cmuir.6653943832-55533
record_format dspace
spelling th-cmuir.6653943832-555332018-09-05T02:58:35Z Training set size reduction in large dataset problems Varin Chouvatut Wattana Jindaluang Ekkarat Boonchieng Computer Science Decision Sciences © 2015 IEEE. Classifiers have known to be used in various fields of applications. However, the main problem usually found recently is about applying a classifier to large datasets. Thus, the process of reducing size of the training set becomes necessary especially to accelerate the processing time of the classifier. Concerning the problem, this paper proposes a new method which can reduce size of the training set in a large dataset. Our proposed method is improved from a famous graph-based algorithm named Optimum-Path Forest (OPF). Our principal concept of reducing the training set's size is to utilize the Segmented Least Square Algorithm (SLSA) in estimating the tree's shape. From the experimental results, our proposed method could reduce size of the training set by about 7 to 21 percent comparing with the original OPF algorithm while the classification's accuracy decreased insignificantly by only about 0.2 to 0.5 percent. In addition, for some datasets, our method provided even as same degree of accuracy as of the original OPF algorithm. 2018-09-05T02:57:37Z 2018-09-05T02:57:37Z 2016-02-08 Conference Proceeding 2-s2.0-84964320834 10.1109/ICSEC.2015.7401435 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84964320834&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/55533
institution Chiang Mai University
building Chiang Mai University Library
country Thailand
collection CMU Intellectual Repository
topic Computer Science
Decision Sciences
spellingShingle Computer Science
Decision Sciences
Varin Chouvatut
Wattana Jindaluang
Ekkarat Boonchieng
Training set size reduction in large dataset problems
description © 2015 IEEE. Classifiers have known to be used in various fields of applications. However, the main problem usually found recently is about applying a classifier to large datasets. Thus, the process of reducing size of the training set becomes necessary especially to accelerate the processing time of the classifier. Concerning the problem, this paper proposes a new method which can reduce size of the training set in a large dataset. Our proposed method is improved from a famous graph-based algorithm named Optimum-Path Forest (OPF). Our principal concept of reducing the training set's size is to utilize the Segmented Least Square Algorithm (SLSA) in estimating the tree's shape. From the experimental results, our proposed method could reduce size of the training set by about 7 to 21 percent comparing with the original OPF algorithm while the classification's accuracy decreased insignificantly by only about 0.2 to 0.5 percent. In addition, for some datasets, our method provided even as same degree of accuracy as of the original OPF algorithm.
format Conference Proceeding
author Varin Chouvatut
Wattana Jindaluang
Ekkarat Boonchieng
author_facet Varin Chouvatut
Wattana Jindaluang
Ekkarat Boonchieng
author_sort Varin Chouvatut
title Training set size reduction in large dataset problems
title_short Training set size reduction in large dataset problems
title_full Training set size reduction in large dataset problems
title_fullStr Training set size reduction in large dataset problems
title_full_unstemmed Training set size reduction in large dataset problems
title_sort training set size reduction in large dataset problems
publishDate 2018
url https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84964320834&origin=inward
http://cmuir.cmu.ac.th/jspui/handle/6653943832/55533
_version_ 1681424523587485696