Automatic Defect Categorization based on Fault Triggering Conditions

Due to the complexity of software systems, defects are inevitable. Understanding the types of defects could help developers to adopt measures in current and future software releases. In practice, developers often categorize defects into various types. One common categorization is based on fault trig...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xia, Xin, LO, David, Wang, Xinyu, Zhou, Bo
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2014
Subjects:	data mining fuzzy set theory program debugging software fault tolerance text analysis Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/2438 http://dx.doi.org/10.1109/ICECCS.2014.14
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-3438
record_format	dspace
spelling	sg-smu-ink.sis_research-34382015-11-15T01:14:12Z Automatic Defect Categorization based on Fault Triggering Conditions Xia, Xin LO, David Wang, Xinyu Zhou, Bo Due to the complexity of software systems, defects are inevitable. Understanding the types of defects could help developers to adopt measures in current and future software releases. In practice, developers often categorize defects into various types. One common categorization is based on fault triggers of defects. Fault trigger is a set of conditions which activate a defect (i.e., Fault) and propagate the defect into a failure. In general, there are two types of defect based fault triggering conditions, Bohrbug and Mandelbug. Bohrbug refers to a bug which can be easily isolated, and its activation and error propagation is simple. Mandelbug refers to a bug whose activation and/or error propagation is complex (e.g., A time lag between the fault activation and the failure occurrence). With these category labels, developers can better perform post-mortem analysis to identify common characteristic of the defects, and design specific fault-tolerance mechanisms. However, in most software systems, these category labels are often unavailable. To address this problem, in this paper, we propose a text mining solution which categorize defects into fault trigger categories by analyzing the natural-language description of bug reports. A previous study shows that Mandelbug is more complex and needs more time to be fixed. Thus, to better identify Mandelbugs, we propose a novel Fuzzy Set based Feature Selection algorithm named USES, which selects the features (i.e., Terms) which have high ability to distinguish Mandelbugs from Bohrbugs. USES first caches a set of terms based on their fuzzy affinity scores to Bohrbug or Mandelbug. Next, it iterates many times, and in each iteration, it selects a subset of terms, and builds a classifier on these terms. USES selects the classifier and the terms which could achieve the best performance on a training data. We evaluate our solution on 4 datasets including Linux, Mysql, Apache HTTPD, and AXIS containing a total of 809 bug reports. We sho- that USES with naive Bayes multinomial achieves the best performance, it achieves Mandelbug F-measure scores of 0.298 - 0.615. We also compare USES with other baseline approaches. The results show that USES on average improves Mandelbug F-measure scores of the best performing baseline by 12.3%. 2014-08-01T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/2438 info:doi/10.1109/ICECCS.2014.14 http://dx.doi.org/10.1109/ICECCS.2014.14 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University data mining fuzzy set theory program debugging software fault tolerance text analysis Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	data mining fuzzy set theory program debugging software fault tolerance text analysis Software Engineering
spellingShingle	data mining fuzzy set theory program debugging software fault tolerance text analysis Software Engineering Xia, Xin LO, David Wang, Xinyu Zhou, Bo Automatic Defect Categorization based on Fault Triggering Conditions
description	Due to the complexity of software systems, defects are inevitable. Understanding the types of defects could help developers to adopt measures in current and future software releases. In practice, developers often categorize defects into various types. One common categorization is based on fault triggers of defects. Fault trigger is a set of conditions which activate a defect (i.e., Fault) and propagate the defect into a failure. In general, there are two types of defect based fault triggering conditions, Bohrbug and Mandelbug. Bohrbug refers to a bug which can be easily isolated, and its activation and error propagation is simple. Mandelbug refers to a bug whose activation and/or error propagation is complex (e.g., A time lag between the fault activation and the failure occurrence). With these category labels, developers can better perform post-mortem analysis to identify common characteristic of the defects, and design specific fault-tolerance mechanisms. However, in most software systems, these category labels are often unavailable. To address this problem, in this paper, we propose a text mining solution which categorize defects into fault trigger categories by analyzing the natural-language description of bug reports. A previous study shows that Mandelbug is more complex and needs more time to be fixed. Thus, to better identify Mandelbugs, we propose a novel Fuzzy Set based Feature Selection algorithm named USES, which selects the features (i.e., Terms) which have high ability to distinguish Mandelbugs from Bohrbugs. USES first caches a set of terms based on their fuzzy affinity scores to Bohrbug or Mandelbug. Next, it iterates many times, and in each iteration, it selects a subset of terms, and builds a classifier on these terms. USES selects the classifier and the terms which could achieve the best performance on a training data. We evaluate our solution on 4 datasets including Linux, Mysql, Apache HTTPD, and AXIS containing a total of 809 bug reports. We sho- that USES with naive Bayes multinomial achieves the best performance, it achieves Mandelbug F-measure scores of 0.298 - 0.615. We also compare USES with other baseline approaches. The results show that USES on average improves Mandelbug F-measure scores of the best performing baseline by 12.3%.
format	text
author	Xia, Xin LO, David Wang, Xinyu Zhou, Bo
author_facet	Xia, Xin LO, David Wang, Xinyu Zhou, Bo
author_sort	Xia, Xin
title	Automatic Defect Categorization based on Fault Triggering Conditions
title_short	Automatic Defect Categorization based on Fault Triggering Conditions
title_full	Automatic Defect Categorization based on Fault Triggering Conditions
title_fullStr	Automatic Defect Categorization based on Fault Triggering Conditions
title_full_unstemmed	Automatic Defect Categorization based on Fault Triggering Conditions
title_sort	automatic defect categorization based on fault triggering conditions
publisher	Institutional Knowledge at Singapore Management University
publishDate	2014
url	https://ink.library.smu.edu.sg/sis_research/2438 http://dx.doi.org/10.1109/ICECCS.2014.14
_version_	1770572147039666176

Automatic Defect Categorization based on Fault Triggering Conditions

Similar Items