Automated identification of high impact bug reports leveraging imbalanced learning strategies

In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resource, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high impact bug...

Full description

Saved in:

Bibliographic Details
Main Authors:	YANG, Xinli, David LO, HUANG, Qiao, XIA, Xin, SUN, Jianling
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2016
Subjects:	High Impact Bug Imbalanced Data Text Classification Computer Sciences Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/3567 https://ink.library.smu.edu.sg/context/sis_research/article/4568/viewcontent/AutomatedIDHighImpactBugReportsLimbalLearning_2016.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4568
record_format	dspace
spelling	sg-smu-ink.sis_research-45682019-06-06T08:10:48Z Automated identification of high impact bug reports leveraging imbalanced learning strategies YANG, Xinli David LO, HUANG, Qiao XIA, Xin SUN, Jianling In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resource, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high impact bugs are used to refer to the bugs which appear in unexpected time or locations and bring more unexpected effects, or break pre-existing functionalities and destroy the user experience. Unfortunately, identifying high impact bugs from the thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high impact bugs, the identification of high impact bug reports is a difficult task. In this paper, we propose an approach to identify high impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various imbalanced learning strategies built upon a number of well-known classification algorithms. In particular, we choose four widely used strategies for dealing with imbalanced data and use naive Bayes multinominal as the classification algorithm to conduct experiments on four datasets from four different open source projects. We perform an empirical study on a specific type of high impact bugs, i.e., surprise bugs, which were first studied by Shihab et al. The results show that under-sampling is the best imbalanced learning strategy with naive Bayes multinominal for high impact bug identification. 2016-06-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3567 info:doi/10.1109/COMPSAC.2016.67 https://ink.library.smu.edu.sg/context/sis_research/article/4568/viewcontent/AutomatedIDHighImpactBugReportsLimbalLearning_2016.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University High Impact Bug Imbalanced Data Text Classification Computer Sciences Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	High Impact Bug Imbalanced Data Text Classification Computer Sciences Software Engineering
spellingShingle	High Impact Bug Imbalanced Data Text Classification Computer Sciences Software Engineering YANG, Xinli David LO, HUANG, Qiao XIA, Xin SUN, Jianling Automated identification of high impact bug reports leveraging imbalanced learning strategies
description	In practice, some bugs have more impact than others and thus deserve more immediate attention. Due to tight schedule and limited human resource, developers may not have enough time to inspect all bugs. Thus, they often concentrate on bugs that are highly impactful. In the literature, high impact bugs are used to refer to the bugs which appear in unexpected time or locations and bring more unexpected effects, or break pre-existing functionalities and destroy the user experience. Unfortunately, identifying high impact bugs from the thousands of bug reports in a bug tracking system is not an easy feat. Thus, an automated technique that can identify high-impact bug reports can help developers to be aware of them early, rectify them quickly, and minimize the damages they cause. Considering that only a small proportion of bugs are high impact bugs, the identification of high impact bug reports is a difficult task. In this paper, we propose an approach to identify high impact bug reports by leveraging imbalanced learning strategies. We investigate the effectiveness of various imbalanced learning strategies built upon a number of well-known classification algorithms. In particular, we choose four widely used strategies for dealing with imbalanced data and use naive Bayes multinominal as the classification algorithm to conduct experiments on four datasets from four different open source projects. We perform an empirical study on a specific type of high impact bugs, i.e., surprise bugs, which were first studied by Shihab et al. The results show that under-sampling is the best imbalanced learning strategy with naive Bayes multinominal for high impact bug identification.
format	text
author	YANG, Xinli David LO, HUANG, Qiao XIA, Xin SUN, Jianling
author_facet	YANG, Xinli David LO, HUANG, Qiao XIA, Xin SUN, Jianling
author_sort	YANG, Xinli
title	Automated identification of high impact bug reports leveraging imbalanced learning strategies
title_short	Automated identification of high impact bug reports leveraging imbalanced learning strategies
title_full	Automated identification of high impact bug reports leveraging imbalanced learning strategies
title_fullStr	Automated identification of high impact bug reports leveraging imbalanced learning strategies
title_full_unstemmed	Automated identification of high impact bug reports leveraging imbalanced learning strategies
title_sort	automated identification of high impact bug reports leveraging imbalanced learning strategies
publisher	Institutional Knowledge at Singapore Management University
publishDate	2016
url	https://ink.library.smu.edu.sg/sis_research/3567 https://ink.library.smu.edu.sg/context/sis_research/article/4568/viewcontent/AutomatedIDHighImpactBugReportsLimbalLearning_2016.pdf
_version_	1770573330307350528

Automated identification of high impact bug reports leveraging imbalanced learning strategies

Similar Items