Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community

© 2019, ICIC International. The emergence of the data science discipline has facilitated the development of novel and advanced machine-learning algorithms for tackling tasks related to data analytics. For example, ensemble learning and deep learning have frequently achieved promising results in many...

Full description

Saved in:

Bibliographic Details
Main Authors:	Passakorn Phannachitta, Kenichi Matsumoto
Format:	Journal
Published:	2019
Subjects:	Computer Science Mathematics
Online Access:	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85067567471&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/65518
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Chiang Mai University

id	th-cmuir.6653943832-65518
record_format	dspace
spelling	th-cmuir.6653943832-655182019-08-05T04:39:24Z Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community Passakorn Phannachitta Kenichi Matsumoto Computer Science Mathematics © 2019, ICIC International. The emergence of the data science discipline has facilitated the development of novel and advanced machine-learning algorithms for tackling tasks related to data analytics. For example, ensemble learning and deep learning have frequently achieved promising results in many recent data-science competitions, such as those hosted by Kaggle. However, these algorithms have not yet been thoroughly assessed on their performance when applied to software effort estimation. In this study, an assessment framework known as a stable-ranking-indication method is adopted to compare 14 machine-learning algorithms widely adopted in the data science communities. The comparisons were carried out over 13 industrial datasets, subject to six robust and independent performance metrics, and supported by the Brunner statistical test method. The results of this study proved to be stable because similar machine-learning algorithms achieved similar performance results; particularly, random forest and bagging performed the best among the compared algorithms. The results further offered evidence that demonstrated how to build an effective stacked ensemble. In other words, the optimal approach to maximizing the overall expected performance of the stacked ensemble can be derived through a balanced trade-off between maximizing the expected accuracy by selecting only the solo algorithms that are most likely to perform outstandingly on the dataset, and maximizing the level of diversity of the algorithms. Precisely, the stack combining bagging, random forests, analogy-based estimation, adaBoost, the gradient boosting machine, and ordinary least squares regression was shown to be the optimal stack for the software effort estimation datasets. 2019-08-05T04:34:51Z 2019-08-05T04:34:51Z 2019-04-01 Journal 13494198 2-s2.0-85067567471 10.24507/ijicic.15.02.569 https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85067567471&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/65518
institution	Chiang Mai University
building	Chiang Mai University Library
country	Thailand
collection	CMU Intellectual Repository
topic	Computer Science Mathematics
spellingShingle	Computer Science Mathematics Passakorn Phannachitta Kenichi Matsumoto Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community
description	© 2019, ICIC International. The emergence of the data science discipline has facilitated the development of novel and advanced machine-learning algorithms for tackling tasks related to data analytics. For example, ensemble learning and deep learning have frequently achieved promising results in many recent data-science competitions, such as those hosted by Kaggle. However, these algorithms have not yet been thoroughly assessed on their performance when applied to software effort estimation. In this study, an assessment framework known as a stable-ranking-indication method is adopted to compare 14 machine-learning algorithms widely adopted in the data science communities. The comparisons were carried out over 13 industrial datasets, subject to six robust and independent performance metrics, and supported by the Brunner statistical test method. The results of this study proved to be stable because similar machine-learning algorithms achieved similar performance results; particularly, random forest and bagging performed the best among the compared algorithms. The results further offered evidence that demonstrated how to build an effective stacked ensemble. In other words, the optimal approach to maximizing the overall expected performance of the stacked ensemble can be derived through a balanced trade-off between maximizing the expected accuracy by selecting only the solo algorithms that are most likely to perform outstandingly on the dataset, and maximizing the level of diversity of the algorithms. Precisely, the stack combining bagging, random forests, analogy-based estimation, adaBoost, the gradient boosting machine, and ordinary least squares regression was shown to be the optimal stack for the software effort estimation datasets.
format	Journal
author	Passakorn Phannachitta Kenichi Matsumoto
author_facet	Passakorn Phannachitta Kenichi Matsumoto
author_sort	Passakorn Phannachitta
title	Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community
title_short	Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community
title_full	Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community
title_fullStr	Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community
title_full_unstemmed	Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community
title_sort	model-based software effort estimation - a robust comparison of 14 algorithms widely used in the data science community
publishDate	2019
url	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85067567471&origin=inward http://cmuir.cmu.ac.th/jspui/handle/6653943832/65518
_version_	1681426283851939840

Model-based software effort estimation - A robust comparison of 14 algorithms widely used in the data science community

Similar Items