Instance reduction for supervised learning using input-output clustering method

© 2015, Central South University Press and Springer-Verlag Berlin Heidelberg. A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. The proposed method clusters the output data into groups and clusters the input data...

Full description

Saved in:

Bibliographic Details
Main Authors:	Anusorn Yodjaiphet, Nipon Theera-Umpon, Sansanee Auephanwiriyakul
Format:	Journal
Published:	2018
Subjects:	Agricultural and Biological Sciences
Online Access:	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84949987838&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/43995
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Chiang Mai University

id	th-cmuir.6653943832-43995
record_format	dspace
spelling	th-cmuir.6653943832-439952018-04-25T07:44:37Z Instance reduction for supervised learning using input-output clustering method Anusorn Yodjaiphet Nipon Theera-Umpon Sansanee Auephanwiriyakul Agricultural and Biological Sciences © 2015, Central South University Press and Springer-Verlag Berlin Heidelberg. A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data. Then, a set of prototypes are selected from the clustered input data. The inessential data can be ultimately discarded from the data set. The proposed method can reduce the effect from outliers because only the prototypes are used. This method is applied to reduce the data set in regression problems. Two standard synthetic data sets and three standard real-world data sets are used for evaluation. The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets. From the experiments, the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets. The numbers of instances of the synthetic data sets are decreased by 25%-69%. The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%, respectively. The reduction rate of 96% is very good for the electrocardiogram (ECG) data set because of the redundant and periodic nature of ECG signals. For all of the data sets, the regression results are similar to those from the corresponding original data sets. Therefore, the regression performance of the proposed method is good while only a fraction of the data is needed in the training process. 2018-01-24T04:36:52Z 2018-01-24T04:36:52Z 2015-12-01 Journal 22275223 20952899 2-s2.0-84949987838 10.1007/s11771-015-3026-4 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84949987838&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/43995
institution	Chiang Mai University
building	Chiang Mai University Library
country	Thailand
collection	CMU Intellectual Repository
topic	Agricultural and Biological Sciences
spellingShingle	Agricultural and Biological Sciences Anusorn Yodjaiphet Nipon Theera-Umpon Sansanee Auephanwiriyakul Instance reduction for supervised learning using input-output clustering method
description	© 2015, Central South University Press and Springer-Verlag Berlin Heidelberg. A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data. Then, a set of prototypes are selected from the clustered input data. The inessential data can be ultimately discarded from the data set. The proposed method can reduce the effect from outliers because only the prototypes are used. This method is applied to reduce the data set in regression problems. Two standard synthetic data sets and three standard real-world data sets are used for evaluation. The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets. From the experiments, the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets. The numbers of instances of the synthetic data sets are decreased by 25%-69%. The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%, respectively. The reduction rate of 96% is very good for the electrocardiogram (ECG) data set because of the redundant and periodic nature of ECG signals. For all of the data sets, the regression results are similar to those from the corresponding original data sets. Therefore, the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.
format	Journal
author	Anusorn Yodjaiphet Nipon Theera-Umpon Sansanee Auephanwiriyakul
author_facet	Anusorn Yodjaiphet Nipon Theera-Umpon Sansanee Auephanwiriyakul
author_sort	Anusorn Yodjaiphet
title	Instance reduction for supervised learning using input-output clustering method
title_short	Instance reduction for supervised learning using input-output clustering method
title_full	Instance reduction for supervised learning using input-output clustering method
title_fullStr	Instance reduction for supervised learning using input-output clustering method
title_full_unstemmed	Instance reduction for supervised learning using input-output clustering method
title_sort	instance reduction for supervised learning using input-output clustering method
publishDate	2018
url	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84949987838&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/43995
_version_	1681422477170835456

Instance reduction for supervised learning using input-output clustering method

Similar Items