Chaff from the wheat: Characterizing and determining valid bug reports

Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is valid (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being invalid reports. Manually determining valid bug report is a...

Full description

Saved in:

Bibliographic Details
Main Authors:	FAN, Yuanrui, XIA, Xin, LO, David, HASSAN, Ahmed E.
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2020
Subjects:	Bug Report Collaboration Computer bugs Feature extraction Feature Generation Forestry Machine Learning Software Support vector machines Task analysis Databases and Information Systems Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/4103 https://ink.library.smu.edu.sg/context/sis_research/article/5106/viewcontent/Fanetal_2018_ChafffromtheWheat.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5106
record_format	dspace
spelling	sg-smu-ink.sis_research-51062021-05-12T03:14:33Z Chaff from the wheat: Characterizing and determining valid bug reports FAN, Yuanrui XIA, Xin LO, David HASSAN, Ahmed E. Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is valid (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being invalid reports. Manually determining valid bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is valid can help developers prioritize their triaging tasks and avoid wasting time and effort on invalid bug reports. In this study, motivated by the above needs, we propose an approach which can determine whether a newly submitted bug report is valid. Our approach first extracts 33 features from bug reports. The extracted features are grouped along 5 dimensions, i.e., reporter experience, collaboration network, completeness, readability and text. Based on these features, we use a random forest classifier to identify valid bug reports. To evaluate the effectiveness of our approach, we experiment on large-scale datasets containing a total of 560,697 bug reports from five open source projects (i.e., Eclipse, Netbeans, Mozilla, Firefox and Thunderbird). On average, across the five datasets, our approach achieves an F1-score for valid bug reports and F1-score for invalid ones of 0.74 and 0.67, respectively. Moreover, our approach achieves an average AUC of 0.81. In terms of AUC and F1-scores for valid and invalid bug reports, our approach statistically significantly outperforms two baselines using features that are proposed by Zanetti et al. [104]. We also study the most important features that distinguish valid bug reports from invalid ones. We find that the textual features of a bug report and reporter's experience are the most important factors to distinguish valid bug reports from invalid ones. 2020-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4103 info:doi/10.1109/TSE.2018.2864217 https://ink.library.smu.edu.sg/context/sis_research/article/5106/viewcontent/Fanetal_2018_ChafffromtheWheat.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Bug Report Collaboration Computer bugs Feature extraction Feature Generation Forestry Machine Learning Software Support vector machines Task analysis Databases and Information Systems Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Bug Report Collaboration Computer bugs Feature extraction Feature Generation Forestry Machine Learning Software Support vector machines Task analysis Databases and Information Systems Software Engineering
spellingShingle	Bug Report Collaboration Computer bugs Feature extraction Feature Generation Forestry Machine Learning Software Support vector machines Task analysis Databases and Information Systems Software Engineering FAN, Yuanrui XIA, Xin LO, David HASSAN, Ahmed E. Chaff from the wheat: Characterizing and determining valid bug reports
description	Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is valid (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being invalid reports. Manually determining valid bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is valid can help developers prioritize their triaging tasks and avoid wasting time and effort on invalid bug reports. In this study, motivated by the above needs, we propose an approach which can determine whether a newly submitted bug report is valid. Our approach first extracts 33 features from bug reports. The extracted features are grouped along 5 dimensions, i.e., reporter experience, collaboration network, completeness, readability and text. Based on these features, we use a random forest classifier to identify valid bug reports. To evaluate the effectiveness of our approach, we experiment on large-scale datasets containing a total of 560,697 bug reports from five open source projects (i.e., Eclipse, Netbeans, Mozilla, Firefox and Thunderbird). On average, across the five datasets, our approach achieves an F1-score for valid bug reports and F1-score for invalid ones of 0.74 and 0.67, respectively. Moreover, our approach achieves an average AUC of 0.81. In terms of AUC and F1-scores for valid and invalid bug reports, our approach statistically significantly outperforms two baselines using features that are proposed by Zanetti et al. [104]. We also study the most important features that distinguish valid bug reports from invalid ones. We find that the textual features of a bug report and reporter's experience are the most important factors to distinguish valid bug reports from invalid ones.
format	text
author	FAN, Yuanrui XIA, Xin LO, David HASSAN, Ahmed E.
author_facet	FAN, Yuanrui XIA, Xin LO, David HASSAN, Ahmed E.
author_sort	FAN, Yuanrui
title	Chaff from the wheat: Characterizing and determining valid bug reports
title_short	Chaff from the wheat: Characterizing and determining valid bug reports
title_full	Chaff from the wheat: Characterizing and determining valid bug reports
title_fullStr	Chaff from the wheat: Characterizing and determining valid bug reports
title_full_unstemmed	Chaff from the wheat: Characterizing and determining valid bug reports
title_sort	chaff from the wheat: characterizing and determining valid bug reports
publisher	Institutional Knowledge at Singapore Management University
publishDate	2020
url	https://ink.library.smu.edu.sg/sis_research/4103 https://ink.library.smu.edu.sg/context/sis_research/article/5106/viewcontent/Fanetal_2018_ChafffromtheWheat.pdf
_version_	1770574310107250688

Chaff from the wheat: Characterizing and determining valid bug reports

Similar Items