Which packages would be affected by this bug report?

A large project (e.g., Ubuntu) usually contains a large number of software packages. Sometimes the same bug report in such project would affect multiple packages, and developers of different packages need to collaborate with one another to fix the bug. Unfortunately, the total number of packages inv...

Full description

Saved in:
Bibliographic Details
Main Authors: HUANG, Qiao, LO, David, XIA, Xin, WANG, Qingye, LI, Shanping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2017
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3919
https://ink.library.smu.edu.sg/context/sis_research/article/4921/viewcontent/0941a124.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4921
record_format dspace
spelling sg-smu-ink.sis_research-49212020-07-22T07:42:20Z Which packages would be affected by this bug report? HUANG, Qiao LO, David XIA, Xin WANG, Qingye LI, Shanping A large project (e.g., Ubuntu) usually contains a large number of software packages. Sometimes the same bug report in such project would affect multiple packages, and developers of different packages need to collaborate with one another to fix the bug. Unfortunately, the total number of packages involved in a project like Ubuntu is relatively large, which makes it time-consuming to manually identify packages that are affected by a bug report. In this paper, we propose an approach named PkgRec that consists of 2 components: a name matching component and an ensemble learning component. In the name matching component, we assign a confidence score for a package if it is mentioned by a bug report. In the ensemble learning component, we divide the training dataset into n subsets and build a sub-classifier on each subset. Then we automatically determine an appropriate weight for each sub-classifier and combine them to predict the confidence score of a package being affected by a new bug report. Finally, PkgRec combines the name matching component and the ensemble learning component to assign a final confidence score to each potential package. A list of top-k packages with the highest confidence scores would then be recommended. We evaluate PkgRec on 3 datasets including Ubuntu, OpenStack, and GNOME with a total number of 42,094 bug reports. We show that PkgRec could achieve recall@5 and recall@10 scores of 0.511-0.737, and 0.614-0.785, respectively. We also compare PkgRec with other state-of-art approaches, namely LDA-KL and MLkNN. The experiment results show that PkgRec on average improves recall@5 and recall@10 scores of LDA-KL by 47% and 31%, and MLkNN by 52% and 37%, respectively. 2017-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3919 info:doi/10.1109/ISSRE.2017.24 https://ink.library.smu.edu.sg/context/sis_research/article/4921/viewcontent/0941a124.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Software Engineering
spellingShingle Software Engineering
HUANG, Qiao
LO, David
XIA, Xin
WANG, Qingye
LI, Shanping
Which packages would be affected by this bug report?
description A large project (e.g., Ubuntu) usually contains a large number of software packages. Sometimes the same bug report in such project would affect multiple packages, and developers of different packages need to collaborate with one another to fix the bug. Unfortunately, the total number of packages involved in a project like Ubuntu is relatively large, which makes it time-consuming to manually identify packages that are affected by a bug report. In this paper, we propose an approach named PkgRec that consists of 2 components: a name matching component and an ensemble learning component. In the name matching component, we assign a confidence score for a package if it is mentioned by a bug report. In the ensemble learning component, we divide the training dataset into n subsets and build a sub-classifier on each subset. Then we automatically determine an appropriate weight for each sub-classifier and combine them to predict the confidence score of a package being affected by a new bug report. Finally, PkgRec combines the name matching component and the ensemble learning component to assign a final confidence score to each potential package. A list of top-k packages with the highest confidence scores would then be recommended. We evaluate PkgRec on 3 datasets including Ubuntu, OpenStack, and GNOME with a total number of 42,094 bug reports. We show that PkgRec could achieve recall@5 and recall@10 scores of 0.511-0.737, and 0.614-0.785, respectively. We also compare PkgRec with other state-of-art approaches, namely LDA-KL and MLkNN. The experiment results show that PkgRec on average improves recall@5 and recall@10 scores of LDA-KL by 47% and 31%, and MLkNN by 52% and 37%, respectively.
format text
author HUANG, Qiao
LO, David
XIA, Xin
WANG, Qingye
LI, Shanping
author_facet HUANG, Qiao
LO, David
XIA, Xin
WANG, Qingye
LI, Shanping
author_sort HUANG, Qiao
title Which packages would be affected by this bug report?
title_short Which packages would be affected by this bug report?
title_full Which packages would be affected by this bug report?
title_fullStr Which packages would be affected by this bug report?
title_full_unstemmed Which packages would be affected by this bug report?
title_sort which packages would be affected by this bug report?
publisher Institutional Knowledge at Singapore Management University
publishDate 2017
url https://ink.library.smu.edu.sg/sis_research/3919
https://ink.library.smu.edu.sg/context/sis_research/article/4921/viewcontent/0941a124.pdf
_version_ 1770573935934439424