TLEL: A two-layer ensemble learning approach for just-in-time defect prediction

Context: Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in...

Full description

Saved in:

Bibliographic Details
Main Authors:	YANG, Xinli, LO, David, XIA, Xin, SUN, Jianling
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2017
Subjects:	Ensemble learning Just-in-time defect prediction Cost effectiveness Databases and Information Systems Information Security
Online Access:	https://ink.library.smu.edu.sg/sis_research/3700 https://ink.library.smu.edu.sg/context/sis_research/article/4702/viewcontent/1_s20_S0950584917302501_main.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-4702
record_format	dspace
spelling	sg-smu-ink.sis_research-47022018-09-14T07:19:08Z TLEL: A two-layer ensemble learning approach for just-in-time defect prediction YANG, Xinli LO, David XIA, Xin SUN, Jianling Context: Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in time [1].Objective: Ensemble learning becomes a hot topic in recent years. There have been several studies about applying ensemble learning to defect prediction [2–5]. Traditional ensemble learning approaches only have one layer, i.e., they use ensemble learning once. There are few studies that leverages ensemble learning twice or more. To bridge this research gap, we try to hybridize various ensemble learning methods to see if it will improve the performance of just-in-time defect prediction. In particular, we focus on one way to do this by hybridizing bagging and stacking together and leave other possibly hybridization strategies for future work. Method: In this paper, we propose a two-layer ensemble learning approach TLEL which leverages decision tree and ensemble learning to improve the performance of just-in-time defect prediction. In the inner layer, we combine decision tree and bagging to build a Random Forest model. In the outer layer, we use random under-sampling to train many different Random Forest models and use stacking to ensemble them once more.Results: To evaluate the performance of TLEL, we use two metrics, i.e., cost effectiveness and F1-score.We perform experiments on the datasets from six large open source projects, i.e., Bugzilla, Columba, JDT,Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. Also, we compare our approach with three baselines, i.e., Deeper, the approach proposed by us [6], DNC, the approach proposed by Wang et al. [2], and MKEL, the approach proposed by Wang et al. [3]. The experimental results show that on average across the six datasets, TLEL could discover over 70% of the bugs by reviewing only 20% of the lines of code, as compared with about 50% for the baselines. In addition, the F1-scores TLEL can achieve are substantially and statistically significantly higher than those of three baselines across the six datasets. Conclusion: TLEL can achieve a substantial and statistically significant improvement over the state-of-the-art methods, i.e., Deeper, DNC and MKEL. Moreover, TLEL could discover over 70% of the bugs by reviewing only 20% of the lines of code. 2017-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3700 info:doi/10.1016/j.infsof.2017.03.007 https://ink.library.smu.edu.sg/context/sis_research/article/4702/viewcontent/1_s20_S0950584917302501_main.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Ensemble learning Just-in-time defect prediction Cost effectiveness Databases and Information Systems Information Security
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Ensemble learning Just-in-time defect prediction Cost effectiveness Databases and Information Systems Information Security
spellingShingle	Ensemble learning Just-in-time defect prediction Cost effectiveness Databases and Information Systems Information Security YANG, Xinli LO, David XIA, Xin SUN, Jianling TLEL: A two-layer ensemble learning approach for just-in-time defect prediction
description	Context: Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in time [1].Objective: Ensemble learning becomes a hot topic in recent years. There have been several studies about applying ensemble learning to defect prediction [2–5]. Traditional ensemble learning approaches only have one layer, i.e., they use ensemble learning once. There are few studies that leverages ensemble learning twice or more. To bridge this research gap, we try to hybridize various ensemble learning methods to see if it will improve the performance of just-in-time defect prediction. In particular, we focus on one way to do this by hybridizing bagging and stacking together and leave other possibly hybridization strategies for future work. Method: In this paper, we propose a two-layer ensemble learning approach TLEL which leverages decision tree and ensemble learning to improve the performance of just-in-time defect prediction. In the inner layer, we combine decision tree and bagging to build a Random Forest model. In the outer layer, we use random under-sampling to train many different Random Forest models and use stacking to ensemble them once more.Results: To evaluate the performance of TLEL, we use two metrics, i.e., cost effectiveness and F1-score.We perform experiments on the datasets from six large open source projects, i.e., Bugzilla, Columba, JDT,Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. Also, we compare our approach with three baselines, i.e., Deeper, the approach proposed by us [6], DNC, the approach proposed by Wang et al. [2], and MKEL, the approach proposed by Wang et al. [3]. The experimental results show that on average across the six datasets, TLEL could discover over 70% of the bugs by reviewing only 20% of the lines of code, as compared with about 50% for the baselines. In addition, the F1-scores TLEL can achieve are substantially and statistically significantly higher than those of three baselines across the six datasets. Conclusion: TLEL can achieve a substantial and statistically significant improvement over the state-of-the-art methods, i.e., Deeper, DNC and MKEL. Moreover, TLEL could discover over 70% of the bugs by reviewing only 20% of the lines of code.
format	text
author	YANG, Xinli LO, David XIA, Xin SUN, Jianling
author_facet	YANG, Xinli LO, David XIA, Xin SUN, Jianling
author_sort	YANG, Xinli
title	TLEL: A two-layer ensemble learning approach for just-in-time defect prediction
title_short	TLEL: A two-layer ensemble learning approach for just-in-time defect prediction
title_full	TLEL: A two-layer ensemble learning approach for just-in-time defect prediction
title_fullStr	TLEL: A two-layer ensemble learning approach for just-in-time defect prediction
title_full_unstemmed	TLEL: A two-layer ensemble learning approach for just-in-time defect prediction
title_sort	tlel: a two-layer ensemble learning approach for just-in-time defect prediction
publisher	Institutional Knowledge at Singapore Management University
publishDate	2017
url	https://ink.library.smu.edu.sg/sis_research/3700 https://ink.library.smu.edu.sg/context/sis_research/article/4702/viewcontent/1_s20_S0950584917302501_main.pdf
_version_	1770573675593990144

TLEL: A two-layer ensemble learning approach for just-in-time defect prediction

Similar Items