Automated Configuration Bug Report Prediction Using Text Mining

Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration...

Full description

Saved in:
Bibliographic Details
Main Authors: Xia, Xin, LO, David, Qiu, Weiwei, Xingen, Wang, Zhou, Bo
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2014
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2418
http://dx.doi.org/10.1109/COMPSAC.2014.17
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-3418
record_format dspace
spelling sg-smu-ink.sis_research-34182015-11-14T15:55:22Z Automated Configuration Bug Report Prediction Using Text Mining Xia, Xin LO, David Qiu, Weiwei Xingen, Wang Zhou, Bo Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration bugs. Given a bug report, an approach that can identify whether the bug is a configuration bug could help developers reduce debugging effort. We refer to this problem as configuration bug reports prediction. To address this problem, we develop a new automated framework that applies text mining technologies on the natural-language description of bug reports to train a statistical model on historical bug reports with known labels (i.e., Configuration or non-configuration), and the statistical model is then used to predict a label for a new bug report. Developers could apply our model to automatically predict labels of bug reports to improve their productivity. Our tool first applies feature selection techniques (e.g., Information gain and Chi-square) to pre-process the textual information in bug reports, and then applies various text mining techniques (e.g., Naive Bayes, SVM, naive Bayes multinomial) to build statistical models. We evaluate our solution on 5 bug report datasets including accumulo, activemq, camel, flume, and wicket. We show that naive Bayes multinomial with information gain achieves the best performance. On average across the 5 projects, its accuracy, configuration F-measure and non-configuration F-measure are 0.811, 0.450, and 0.880, respectively. We also compare our solution with the method proposed by Arshad et al. The results show that our proposed approach that uses naive Bayes multinomial with information gain on average improves accuracy, configuration F-measure and non-configuration F-measure scores of Arshad et al.'s method by 8.34%, 103.7%, and 4.24%, respectively. 2014-07-01T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/2418 info:doi/10.1109/COMPSAC.2014.17 http://dx.doi.org/10.1109/COMPSAC.2014.17 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University data mining program debugging statistical analysis text analysis Information Security Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic data mining
program debugging
statistical analysis
text analysis
Information Security
Software Engineering
spellingShingle data mining
program debugging
statistical analysis
text analysis
Information Security
Software Engineering
Xia, Xin
LO, David
Qiu, Weiwei
Xingen, Wang
Zhou, Bo
Automated Configuration Bug Report Prediction Using Text Mining
description Configuration bugs are one of the dominant causes of software failures. Previous studies show that a configuration bug could cause huge financial losses in a software system. The importance of configuration bugs has attracted various research studies, e.g., To detect, diagnose, and fix configuration bugs. Given a bug report, an approach that can identify whether the bug is a configuration bug could help developers reduce debugging effort. We refer to this problem as configuration bug reports prediction. To address this problem, we develop a new automated framework that applies text mining technologies on the natural-language description of bug reports to train a statistical model on historical bug reports with known labels (i.e., Configuration or non-configuration), and the statistical model is then used to predict a label for a new bug report. Developers could apply our model to automatically predict labels of bug reports to improve their productivity. Our tool first applies feature selection techniques (e.g., Information gain and Chi-square) to pre-process the textual information in bug reports, and then applies various text mining techniques (e.g., Naive Bayes, SVM, naive Bayes multinomial) to build statistical models. We evaluate our solution on 5 bug report datasets including accumulo, activemq, camel, flume, and wicket. We show that naive Bayes multinomial with information gain achieves the best performance. On average across the 5 projects, its accuracy, configuration F-measure and non-configuration F-measure are 0.811, 0.450, and 0.880, respectively. We also compare our solution with the method proposed by Arshad et al. The results show that our proposed approach that uses naive Bayes multinomial with information gain on average improves accuracy, configuration F-measure and non-configuration F-measure scores of Arshad et al.'s method by 8.34%, 103.7%, and 4.24%, respectively.
format text
author Xia, Xin
LO, David
Qiu, Weiwei
Xingen, Wang
Zhou, Bo
author_facet Xia, Xin
LO, David
Qiu, Weiwei
Xingen, Wang
Zhou, Bo
author_sort Xia, Xin
title Automated Configuration Bug Report Prediction Using Text Mining
title_short Automated Configuration Bug Report Prediction Using Text Mining
title_full Automated Configuration Bug Report Prediction Using Text Mining
title_fullStr Automated Configuration Bug Report Prediction Using Text Mining
title_full_unstemmed Automated Configuration Bug Report Prediction Using Text Mining
title_sort automated configuration bug report prediction using text mining
publisher Institutional Knowledge at Singapore Management University
publishDate 2014
url https://ink.library.smu.edu.sg/sis_research/2418
http://dx.doi.org/10.1109/COMPSAC.2014.17
_version_ 1770572140378062848