High impact bug report identification with imbalanced learning strategies

In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bu...

Full description

Saved in:

Bibliographic Details
Main Authors:	YANG, Xinli, LO, David, XIA, Xin, HUANG, Qiao, SUN, Jianling
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	high-impact bug imbalanced learning bug report identification Databases and Information Systems Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/3702 https://ink.library.smu.edu.sg/context/sis_research/article/4704/viewcontent/HighImpactBugReportDetectionImbalancedLearning_2017.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4704
record_format	dspace
spelling	sg-smu-ink.sis_research-47042021-03-12T06:16:08Z High impact bug report identification with imbalanced learning strategies YANG, Xinli LO, David XIA, Xin HUANG, Qiao SUN, Jianling In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab. 2017-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3702 info:doi/10.1007/s11390-017-1713-3 https://ink.library.smu.edu.sg/context/sis_research/article/4704/viewcontent/HighImpactBugReportDetectionImbalancedLearning_2017.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University high-impact bug imbalanced learning bug report identification Databases and Information Systems Information Security
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	high-impact bug imbalanced learning bug report identification Databases and Information Systems Information Security
spellingShingle	high-impact bug imbalanced learning bug report identification Databases and Information Systems Information Security YANG, Xinli LO, David XIA, Xin HUANG, Qiao SUN, Jianling High impact bug report identification with imbalanced learning strategies
description	In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resources, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high-impact bugs are used to refer to the bugs which appear at unexpected time or locations and bring more unexpected effects (i.e., surprise bugs), or break pre-existing functionalities and destroy the user experience (i.e., breakage bugs). Unfortunately, identifying high-impact bugs from thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high-impact bugs, the identification of high-impact bug reports is a difficult task. In this paper, we propose an approach to identify high-impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various variants, each of which combines one particular imbalanced learning strategy and one particular classification algorithm. In particular, we choose four widely used strategies for dealing with imbalanced data and four state-of-the-art text classification algorithms to conduct experiments on four datasets from four different open source projects. We mainly perform an analytical study on two types of high-impact bugs, i.e., surprise bugs and breakage bugs. The results show that different variants have different performances, and the best performing variants SMOTE (synthetic minority over-sampling technique) + KNN (K-nearest neighbours) for surprise bug identification and RUS (random under-sampling) + NB (naive Bayes) for breakage bug identification outperform the F1-scores of the two state-of-the-art approaches by Thung et al. and Garcia and Shihab.
format	text
author	YANG, Xinli LO, David XIA, Xin HUANG, Qiao SUN, Jianling
author_facet	YANG, Xinli LO, David XIA, Xin HUANG, Qiao SUN, Jianling
author_sort	YANG, Xinli
title	High impact bug report identification with imbalanced learning strategies
title_short	High impact bug report identification with imbalanced learning strategies
title_full	High impact bug report identification with imbalanced learning strategies
title_fullStr	High impact bug report identification with imbalanced learning strategies
title_full_unstemmed	High impact bug report identification with imbalanced learning strategies
title_sort	high impact bug report identification with imbalanced learning strategies
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/3702 https://ink.library.smu.edu.sg/context/sis_research/article/4704/viewcontent/HighImpactBugReportDetectionImbalancedLearning_2017.pdf
_version_	1770573676014469120

High impact bug report identification with imbalanced learning strategies

Similar Items