Self-admitted technical debts identification: How far are we?

Self-admitted technical debt (SATD) is a kind of technical debt that is already acknowledged by the developers and needs additional work or resources to address in the future. In recent years, though many methods have been proposed to detect SATDs, these methods have mainly focused on Java-type code...

Full description

Saved in:

Bibliographic Details
Main Authors:	GU, Hao, ZHANG, Shichao, HUANG, Qiao, LIAO, Zhifang, LIU, Jiakun, LO, David
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	multi-task learning Self-Admitted Technical Debt MT-BERT-SATD Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9260
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10260
record_format	dspace
spelling	sg-smu-ink.sis_research-102602024-09-02T04:48:03Z Self-admitted technical debts identification: How far are we? GU, Hao ZHANG, Shichao HUANG, Qiao LIAO, Zhifang LIU, Jiakun LO, David Self-admitted technical debt (SATD) is a kind of technical debt that is already acknowledged by the developers and needs additional work or resources to address in the future. In recent years, though many methods have been proposed to detect SATDs, these methods have mainly focused on Java-type code comments published by Maldonado et al. It is unclear whether these methods trained on Maldonado's code comments dataset can find SATD in other programming languages or other software artifacts, such as issue trackers, pull requests, and commit messages effectively. In order to answer the above confusion and investigate how far our community has progressed in the field of SATD identification, we first collect a comprehensive dataset that contains SATDs in code comments from four different programming languages (java, python, docker file, XML) and SATDs in other different artifacts (issue tracker, pull requests, commit messages) from previous papers working in the field of SATD. Then, we re-train the existing models with Maldonado's code comments dataset and test all the models on other programming languages and other artifacts. The results show that existing SATD identification methods can find SATDs in other non-Java languages, but perform poorly in identifying SATDs from three other different artifacts. In addition, in order to simultaneously identify four different artifacts of SATDs, we develop a Multi-Task Learning model utilizing BERT for SATD identification (MT-BERT-SATD). Considering four different artifacts and the SATD identification tasks, MT-BERT-SATD achieves an average F1-score of 0.712 (0.625-0.859), which is superior to existing models from 4.6% to 30.4%. Results show that MT-BERT-SATD can effectively identify SATD instances across explored programming languages and software artifacts, indicating its capability to identify SATD instances in new and unexplored programming languages and software artifacts. 2024-03-15T07:00:00Z text https://ink.library.smu.edu.sg/sis_research/9260 info:doi/10.1109/SANER60148.2024.00087 Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University multi-task learning Self-Admitted Technical Debt MT-BERT-SATD Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	multi-task learning Self-Admitted Technical Debt MT-BERT-SATD Software Engineering
spellingShingle	multi-task learning Self-Admitted Technical Debt MT-BERT-SATD Software Engineering GU, Hao ZHANG, Shichao HUANG, Qiao LIAO, Zhifang LIU, Jiakun LO, David Self-admitted technical debts identification: How far are we?
description	Self-admitted technical debt (SATD) is a kind of technical debt that is already acknowledged by the developers and needs additional work or resources to address in the future. In recent years, though many methods have been proposed to detect SATDs, these methods have mainly focused on Java-type code comments published by Maldonado et al. It is unclear whether these methods trained on Maldonado's code comments dataset can find SATD in other programming languages or other software artifacts, such as issue trackers, pull requests, and commit messages effectively. In order to answer the above confusion and investigate how far our community has progressed in the field of SATD identification, we first collect a comprehensive dataset that contains SATDs in code comments from four different programming languages (java, python, docker file, XML) and SATDs in other different artifacts (issue tracker, pull requests, commit messages) from previous papers working in the field of SATD. Then, we re-train the existing models with Maldonado's code comments dataset and test all the models on other programming languages and other artifacts. The results show that existing SATD identification methods can find SATDs in other non-Java languages, but perform poorly in identifying SATDs from three other different artifacts. In addition, in order to simultaneously identify four different artifacts of SATDs, we develop a Multi-Task Learning model utilizing BERT for SATD identification (MT-BERT-SATD). Considering four different artifacts and the SATD identification tasks, MT-BERT-SATD achieves an average F1-score of 0.712 (0.625-0.859), which is superior to existing models from 4.6% to 30.4%. Results show that MT-BERT-SATD can effectively identify SATD instances across explored programming languages and software artifacts, indicating its capability to identify SATD instances in new and unexplored programming languages and software artifacts.
format	text
author	GU, Hao ZHANG, Shichao HUANG, Qiao LIAO, Zhifang LIU, Jiakun LO, David
author_facet	GU, Hao ZHANG, Shichao HUANG, Qiao LIAO, Zhifang LIU, Jiakun LO, David
author_sort	GU, Hao
title	Self-admitted technical debts identification: How far are we?
title_short	Self-admitted technical debts identification: How far are we?
title_full	Self-admitted technical debts identification: How far are we?
title_fullStr	Self-admitted technical debts identification: How far are we?
title_full_unstemmed	Self-admitted technical debts identification: How far are we?
title_sort	self-admitted technical debts identification: how far are we?
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/9260
_version_	1814047847459323904

Self-admitted technical debts identification: How far are we?

Similar Items