Prevalence, Contents and Automatic Detection of KL-SATD
When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-...
Saved in:
Main Authors: | , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/5624 https://ink.library.smu.edu.sg/context/sis_research/article/6627/viewcontent/KL_SATD_2020_pv.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-6627 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-66272021-05-12T09:00:31Z Prevalence, Contents and Automatic Detection of KL-SATD RANTALA, Leevi MANTYLA, Mika LO, David When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector. 2020-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5624 info:doi/10.1109/SEAA51224.2020.00069 https://ink.library.smu.edu.sg/context/sis_research/article/6627/viewcontent/KL_SATD_2020_pv.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University data mining Natural language processing self-admitted technical debt Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
data mining Natural language processing self-admitted technical debt Software Engineering |
spellingShingle |
data mining Natural language processing self-admitted technical debt Software Engineering RANTALA, Leevi MANTYLA, Mika LO, David Prevalence, Contents and Automatic Detection of KL-SATD |
description |
When developers use different keywords such as TODO and FIXME in source code comments to describe self-admitted technical debt (SATD), we refer it as Keyword-Labeled SATD (KL-SATD). We study KL-SATD from 33 software repositories with 13,588 KL-SATD comments. We find that the median percentage of KL-SATD comments among all comments is only 1,52%. We find that KL-SATD comment contents include words expressing code changes and uncertainty, such as remove, fix, maybe and probably. This makes them different compared to other comments. KL-SATD comment contents are similar to manually labeled SATD comments of prior work. Our machine learning classifier using logistic Lasso regression has good performance in detecting KL-SATD comments (AUC-ROC 0.88). Finally, we demonstrate that using machine learning we can identify comments that are currently missing but which should have a SATD keyword in them. Automating SATD identification of comments that lack SATD keywords can save time and effort by replacing manual identification of comments. Using KL-SATD offers a potential to bootstrap a complete SATD detector. |
format |
text |
author |
RANTALA, Leevi MANTYLA, Mika LO, David |
author_facet |
RANTALA, Leevi MANTYLA, Mika LO, David |
author_sort |
RANTALA, Leevi |
title |
Prevalence, Contents and Automatic Detection of KL-SATD |
title_short |
Prevalence, Contents and Automatic Detection of KL-SATD |
title_full |
Prevalence, Contents and Automatic Detection of KL-SATD |
title_fullStr |
Prevalence, Contents and Automatic Detection of KL-SATD |
title_full_unstemmed |
Prevalence, Contents and Automatic Detection of KL-SATD |
title_sort |
prevalence, contents and automatic detection of kl-satd |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/sis_research/5624 https://ink.library.smu.edu.sg/context/sis_research/article/6627/viewcontent/KL_SATD_2020_pv.pdf |
_version_ |
1770575532572803072 |