Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling

Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. Howe...

Full description

Saved in:
Bibliographic Details
Main Authors: NGUYEN, Anh Tuan, NGUYEN, Tung, NGUYEN, Tien, LO, David, SUN, Chengnian
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2012
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/1571
https://ink.library.smu.edu.sg/context/sis_research/article/2570/viewcontent/Duplicate_bug_report_pv.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-2570
record_format dspace
spelling sg-smu-ink.sis_research-25702020-01-08T03:25:06Z Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling NGUYEN, Anh Tuan NGUYEN, Tung NGUYEN, Tien LO, David SUN, Chengnian Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug reports as the ones about the same technical issue(s). Trained with historical data including identified duplicate reports, it is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones. Our empirical evaluation on real-world systems shows that DBTM improves the state-of-the-art approaches by up to 20% in accuracy. 2012-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/1571 info:doi/10.1145/2351676.2351687 https://ink.library.smu.edu.sg/context/sis_research/article/2570/viewcontent/Duplicate_bug_report_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Duplicate Bug Reports Topic Model Information Retrieval Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Duplicate Bug Reports
Topic Model
Information Retrieval
Software Engineering
spellingShingle Duplicate Bug Reports
Topic Model
Information Retrieval
Software Engineering
NGUYEN, Anh Tuan
NGUYEN, Tung
NGUYEN, Tien
LO, David
SUN, Chengnian
Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
description Detecting duplicate bug reports helps reduce triaging efforts and save time for developers in fixing the same issues. Among several automated detection approaches, text-based information retrieval (IR) approaches have been shown to outperform others in term of both accuracy and time efficiency. However, those IR-based approaches do not detect well the duplicate reports on the same technical issues written in different descriptive terms. This paper introduces DBTM, a duplicate bug report detection approach that takes advantage of both IR-based features and topic-based features. DBTM models a bug report as a textual document describing certain technical issue(s), and models duplicate bug reports as the ones about the same technical issue(s). Trained with historical data including identified duplicate reports, it is able to learn the sets of different terms describing the same technical issues and to detect other not-yet-identified duplicate ones. Our empirical evaluation on real-world systems shows that DBTM improves the state-of-the-art approaches by up to 20% in accuracy.
format text
author NGUYEN, Anh Tuan
NGUYEN, Tung
NGUYEN, Tien
LO, David
SUN, Chengnian
author_facet NGUYEN, Anh Tuan
NGUYEN, Tung
NGUYEN, Tien
LO, David
SUN, Chengnian
author_sort NGUYEN, Anh Tuan
title Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
title_short Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
title_full Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
title_fullStr Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
title_full_unstemmed Duplicate Bug Report Detection with a Combination of Information Retrieval and Topic Modeling
title_sort duplicate bug report detection with a combination of information retrieval and topic modeling
publisher Institutional Knowledge at Singapore Management University
publishDate 2012
url https://ink.library.smu.edu.sg/sis_research/1571
https://ink.library.smu.edu.sg/context/sis_research/article/2570/viewcontent/Duplicate_bug_report_pv.pdf
_version_ 1770571304438595584