The impact of automated feature selection techniques on the interpretation of defect models

The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. Yet, conclusions that are derived from defect models...

Full description

Saved in:

Bibliographic Details
Main Authors:	JIARPAKDEE, Jirayus, TANTITHAMTHAVORN, Chakkrit, TREUDE, Christoph
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Software analytics Defect prediction Model interpretation Feature selection Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/8796 https://ink.library.smu.edu.sg/context/sis_research/article/9799/viewcontent/s10664_020_09848_1.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9799
record_format	dspace
spelling	sg-smu-ink.sis_research-97992024-05-30T08:47:44Z The impact of automated feature selection techniques on the interpretation of defect models JIARPAKDEE, Jirayus TANTITHAMTHAVORN, Chakkrit TREUDE, Christoph The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. Yet, conclusions that are derived from defect models may be inconsistent if the selected metrics are inconsistent and correlated. In this paper, we systematically investigate 12 automated feature selection techniques with respect to the consistency, correlation, performance, computational cost, and the impact on the interpretation dimensions. Through an empirical investigation of 14 publicly-available defect datasets, we find that (1) 94–100% of the selected metrics are inconsistent among the studied techniques; (2) 37–90% of the selected metrics are inconsistent among training samples; (3) 0–68% of the selected metrics are inconsistent when the feature selection techniques are applied repeatedly; (4) 5–100% of the produced subsets of metrics contain highly correlated metrics; and (5) while the most important metrics are inconsistent among correlation threshold values, such inconsistent most important metrics are highly-correlated with the Spearman correlation of 0.85–1. Since we find that the subsets of metrics produced by the commonly-used feature selection techniques (except for AutoSpearman) are often inconsistent and correlated, these techniques should be avoided when interpreting defect models. In addition to introducing AutoSpearman which mitigates correlated metrics better than commonly-used feature selection techniques, this paper opens up new research avenues in the automated selection of features for defect models to optimise for interpretability as well as performance. 2020-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8796 info:doi/10.1007/s10664-020-09848-1 https://ink.library.smu.edu.sg/context/sis_research/article/9799/viewcontent/s10664_020_09848_1.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software analytics Defect prediction Model interpretation Feature selection Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Software analytics Defect prediction Model interpretation Feature selection Software Engineering
spellingShingle	Software analytics Defect prediction Model interpretation Feature selection Software Engineering JIARPAKDEE, Jirayus TANTITHAMTHAVORN, Chakkrit TREUDE, Christoph The impact of automated feature selection techniques on the interpretation of defect models
description	The interpretation of defect models heavily relies on software metrics that are used to construct them. Prior work often uses feature selection techniques to remove metrics that are correlated and irrelevant in order to improve model performance. Yet, conclusions that are derived from defect models may be inconsistent if the selected metrics are inconsistent and correlated. In this paper, we systematically investigate 12 automated feature selection techniques with respect to the consistency, correlation, performance, computational cost, and the impact on the interpretation dimensions. Through an empirical investigation of 14 publicly-available defect datasets, we find that (1) 94–100% of the selected metrics are inconsistent among the studied techniques; (2) 37–90% of the selected metrics are inconsistent among training samples; (3) 0–68% of the selected metrics are inconsistent when the feature selection techniques are applied repeatedly; (4) 5–100% of the produced subsets of metrics contain highly correlated metrics; and (5) while the most important metrics are inconsistent among correlation threshold values, such inconsistent most important metrics are highly-correlated with the Spearman correlation of 0.85–1. Since we find that the subsets of metrics produced by the commonly-used feature selection techniques (except for AutoSpearman) are often inconsistent and correlated, these techniques should be avoided when interpreting defect models. In addition to introducing AutoSpearman which mitigates correlated metrics better than commonly-used feature selection techniques, this paper opens up new research avenues in the automated selection of features for defect models to optimise for interpretability as well as performance.
format	text
author	JIARPAKDEE, Jirayus TANTITHAMTHAVORN, Chakkrit TREUDE, Christoph
author_facet	JIARPAKDEE, Jirayus TANTITHAMTHAVORN, Chakkrit TREUDE, Christoph
author_sort	JIARPAKDEE, Jirayus
title	The impact of automated feature selection techniques on the interpretation of defect models
title_short	The impact of automated feature selection techniques on the interpretation of defect models
title_full	The impact of automated feature selection techniques on the interpretation of defect models
title_fullStr	The impact of automated feature selection techniques on the interpretation of defect models
title_full_unstemmed	The impact of automated feature selection techniques on the interpretation of defect models
title_sort	impact of automated feature selection techniques on the interpretation of defect models
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/8796 https://ink.library.smu.edu.sg/context/sis_research/article/9799/viewcontent/s10664_020_09848_1.pdf
_version_	1814047532524765184

The impact of automated feature selection techniques on the interpretation of defect models

Similar Items