Autospearman: Automatically mitigating correlated software metrics for interpreting defect models

The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect...

Full description

Saved in:
Bibliographic Details
Main Authors: JIARPAKDEE, Jirayus, TANTITHAMTHAVORN, Chakkrit, TREUDE, Christoph
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8829
https://ink.library.smu.edu.sg/context/sis_research/article/9832/viewcontent/icsme18a.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9832
record_format dspace
spelling sg-smu-ink.sis_research-98322024-06-06T09:30:28Z Autospearman: Automatically mitigating correlated software metrics for interpreting defect models JIARPAKDEE, Jirayus TANTITHAMTHAVORN, Chakkrit TREUDE, Christoph The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques. 2018-11-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8829 info:doi/10.1109/ICSME.2018.00018 https://ink.library.smu.edu.sg/context/sis_research/article/9832/viewcontent/icsme18a.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Correlated Metrics Defect Prediction Feature Selection Model Interpretation Software Analytics Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Correlated Metrics
Defect Prediction
Feature Selection
Model Interpretation
Software Analytics
Software Engineering
spellingShingle Correlated Metrics
Defect Prediction
Feature Selection
Model Interpretation
Software Analytics
Software Engineering
JIARPAKDEE, Jirayus
TANTITHAMTHAVORN, Chakkrit
TREUDE, Christoph
Autospearman: Automatically mitigating correlated software metrics for interpreting defect models
description The interpretation of defect models heavily relies on software metrics that are used to construct them. However, such software metrics are often correlated in defect models. Prior work often uses feature selection techniques to remove correlated metrics in order to improve the performance of defect models. Yet, the interpretation of defect models may be misleading if feature selection techniques produce subsets of inconsistent and correlated metrics. In this paper, we investigate the consistency and correlation of the subsets of metrics that are produced by nine commonly-used feature selection techniques. Through a case study of 13 publicly-available defect datasets, we find that feature selection techniques produce inconsistent subsets of metrics and do not mitigate correlated metrics, suggesting that feature selection techniques should not be used and correlation analyses must be applied when the goal is model interpretation. Since correlation analyses often involve manual selection of metrics by a domain expert, we introduce AutoSpearman, an automated metric selection approach based on correlation analyses. Our evaluation indicates that AutoSpearman yields the highest consistency of subsets of metrics among training samples and mitigates correlated metrics, while impacting model performance by 1-2%pts. Thus, to automatically mitigate correlated metrics when interpreting defect models, we recommend future studies use AutoSpearman in lieu of commonly-used feature selection techniques.
format text
author JIARPAKDEE, Jirayus
TANTITHAMTHAVORN, Chakkrit
TREUDE, Christoph
author_facet JIARPAKDEE, Jirayus
TANTITHAMTHAVORN, Chakkrit
TREUDE, Christoph
author_sort JIARPAKDEE, Jirayus
title Autospearman: Automatically mitigating correlated software metrics for interpreting defect models
title_short Autospearman: Automatically mitigating correlated software metrics for interpreting defect models
title_full Autospearman: Automatically mitigating correlated software metrics for interpreting defect models
title_fullStr Autospearman: Automatically mitigating correlated software metrics for interpreting defect models
title_full_unstemmed Autospearman: Automatically mitigating correlated software metrics for interpreting defect models
title_sort autospearman: automatically mitigating correlated software metrics for interpreting defect models
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/8829
https://ink.library.smu.edu.sg/context/sis_research/article/9832/viewcontent/icsme18a.pdf
_version_ 1814047568744677376