On the reproducibility and replicability of deep learning in software engineering

Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge.Objective: Although many DL s...

Full description

Saved in:

Bibliographic Details
Main Authors:	LIU, Chao, GAO, Cuiyun, XIA, Xin, LO, David, GRUNDY, John C., YANG, Xiaohu
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Deep Learning Replicability Reproducibility Software Engineering Databases and Information Systems
Online Access:	https://ink.library.smu.edu.sg/sis_research/7629 https://ink.library.smu.edu.sg/context/sis_research/article/8632/viewcontent/2006.14244.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-8632
record_format	dspace
spelling	sg-smu-ink.sis_research-86322023-01-10T03:59:23Z On the reproducibility and replicability of deep learning in software engineering LIU, Chao GAO, Cuiyun XIA, Xin LO, David GRUNDY, John C. YANG, Xiaohu Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge.Objective: Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) reproducibility—whether the reported experimental results can be obtained by other researchers using authors’ artifacts (i.e., source code and datasets) with the same experimental setup; and (2) replicability—whether the reported experimental result can be obtained by other researchers using their re-implemented artifacts with a different experimental setup. We observed that DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process, unlike classical supervised machine learning (ML) methods (e.g., random forest). This study aims to investigate the urgency and importance of reproducibility and replicability for DL studies on SE tasks.Method: In this study, we conducted a literature review on 147 DL studies recently published in 20 SE venues and 20 AI (Artificial Intelligence) venues to investigate these issues. We also re-ran four representative DL models in SE to investigate important factors that may strongly affect the reproducibility and replicability of a study.Results: Our statistics show the urgency of investigating these two factors in SE, where only 10.2% of the studies investigate any research question to show that their models can address at least one issue of replicability and/or reproducibility. More than 62.6% of the studies do not even share high-quality source code or complete data to support the reproducibility of their complex models. Meanwhile, our experimental results show the importance of reproducibility and replicability, where the reported performance of a DL model could not be reproduced for an unstable optimization process. Replicability could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data.Conclusion: It is urgent for the SE community to provide a long-lasting link to a high-quality reproduction package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data. 2022-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7629 info:doi/10.1145/3477535 https://ink.library.smu.edu.sg/context/sis_research/article/8632/viewcontent/2006.14244.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep Learning Replicability Reproducibility Software Engineering Databases and Information Systems Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Deep Learning Replicability Reproducibility Software Engineering Databases and Information Systems Software Engineering
spellingShingle	Deep Learning Replicability Reproducibility Software Engineering Databases and Information Systems Software Engineering LIU, Chao GAO, Cuiyun XIA, Xin LO, David GRUNDY, John C. YANG, Xiaohu On the reproducibility and replicability of deep learning in software engineering
description	Context: Deep learning (DL) techniques have gained significant popularity among software engineering (SE) researchers in recent years. This is because they can often solve many SE challenges without enormous manual feature engineering effort and complex domain knowledge.Objective: Although many DL studies have reported substantial advantages over other state-of-the-art models on effectiveness, they often ignore two factors: (1) reproducibility—whether the reported experimental results can be obtained by other researchers using authors’ artifacts (i.e., source code and datasets) with the same experimental setup; and (2) replicability—whether the reported experimental result can be obtained by other researchers using their re-implemented artifacts with a different experimental setup. We observed that DL studies commonly overlook these two factors and declare them as minor threats or leave them for future work. This is mainly due to high model complexity with many manually set parameters and the time-consuming optimization process, unlike classical supervised machine learning (ML) methods (e.g., random forest). This study aims to investigate the urgency and importance of reproducibility and replicability for DL studies on SE tasks.Method: In this study, we conducted a literature review on 147 DL studies recently published in 20 SE venues and 20 AI (Artificial Intelligence) venues to investigate these issues. We also re-ran four representative DL models in SE to investigate important factors that may strongly affect the reproducibility and replicability of a study.Results: Our statistics show the urgency of investigating these two factors in SE, where only 10.2% of the studies investigate any research question to show that their models can address at least one issue of replicability and/or reproducibility. More than 62.6% of the studies do not even share high-quality source code or complete data to support the reproducibility of their complex models. Meanwhile, our experimental results show the importance of reproducibility and replicability, where the reported performance of a DL model could not be reproduced for an unstable optimization process. Replicability could be substantially compromised if the model training is not convergent, or if performance is sensitive to the size of vocabulary and testing data.Conclusion: It is urgent for the SE community to provide a long-lasting link to a high-quality reproduction package, enhance DL-based solution stability and convergence, and avoid performance sensitivity on different sampled data.
format	text
author	LIU, Chao GAO, Cuiyun XIA, Xin LO, David GRUNDY, John C. YANG, Xiaohu
author_facet	LIU, Chao GAO, Cuiyun XIA, Xin LO, David GRUNDY, John C. YANG, Xiaohu
author_sort	LIU, Chao
title	On the reproducibility and replicability of deep learning in software engineering
title_short	On the reproducibility and replicability of deep learning in software engineering
title_full	On the reproducibility and replicability of deep learning in software engineering
title_fullStr	On the reproducibility and replicability of deep learning in software engineering
title_full_unstemmed	On the reproducibility and replicability of deep learning in software engineering
title_sort	on the reproducibility and replicability of deep learning in software engineering
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/7629 https://ink.library.smu.edu.sg/context/sis_research/article/8632/viewcontent/2006.14244.pdf
_version_	1770576406560899072

On the reproducibility and replicability of deep learning in software engineering

Similar Items