Identifying self-admitted technical debt in open source projects using text mining
Technical debt is a metaphor to describe the situation in which long-term code quality is traded for short-term goals in software projects. Recently, the concept of self-admitted technical debt (SATD) was proposed, which considers debt that is intentionally introduced, e.g., in the form of quick or...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2018
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/3793 https://ink.library.smu.edu.sg/context/sis_research/article/4795/viewcontent/101007_2Fs10664_017_9522_4.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-4795 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-47952020-01-14T02:33:07Z Identifying self-admitted technical debt in open source projects using text mining HUANG, Qiao SHIHAB, Emad XIA, Xin LO, David LI, Shanping Technical debt is a metaphor to describe the situation in which long-term code quality is traded for short-term goals in software projects. Recently, the concept of self-admitted technical debt (SATD) was proposed, which considers debt that is intentionally introduced, e.g., in the form of quick or temporary fixes. Prior work on SATD has shown that source code comments can be used to successfully detect SATD, however, most current state-of-the-art classification approaches of SATD rely on manual inspection of the source code comments. In this paper, we proposed an automated approach to detect SATD in source code comments using text mining. In our approach, we utilize feature selection to select useful features for classifier training, and we combine multiple classifiers from different source projects to build a composite classifier that identifies SATD comments in a target project. We investigate the performance of our approach on 8 open source projects that contain 212,413 comments. Our experimental results show that, on every target project, our approach outperforms the state-of-the-art and the baselines approaches in terms of F1-score. The F1-score achieved by our approach ranges between 0.518 - 0.841, with an average of 0.737, which improves over the state-of-the-art approach proposed by Potdar and Shihab by 499.19%. When compared with the text mining-based baseline approaches, our approach significantly improves the average F1-score by at least 58.49%. When compared with a natural language processing-based baseline, our approach also significantly improves its F1-score by 27.95%. Our proposed approach can be used by project personnel to effectively identify SATD with minimal manual effort. 2018-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3793 info:doi/10.1007/s10664-017-9522-4 https://ink.library.smu.edu.sg/context/sis_research/article/4795/viewcontent/101007_2Fs10664_017_9522_4.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Source code comments Technical debt Text mininga Databases and Information Systems Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Source code comments Technical debt Text mininga Databases and Information Systems Software Engineering |
spellingShingle |
Source code comments Technical debt Text mininga Databases and Information Systems Software Engineering HUANG, Qiao SHIHAB, Emad XIA, Xin LO, David LI, Shanping Identifying self-admitted technical debt in open source projects using text mining |
description |
Technical debt is a metaphor to describe the situation in which long-term code quality is traded for short-term goals in software projects. Recently, the concept of self-admitted technical debt (SATD) was proposed, which considers debt that is intentionally introduced, e.g., in the form of quick or temporary fixes. Prior work on SATD has shown that source code comments can be used to successfully detect SATD, however, most current state-of-the-art classification approaches of SATD rely on manual inspection of the source code comments. In this paper, we proposed an automated approach to detect SATD in source code comments using text mining. In our approach, we utilize feature selection to select useful features for classifier training, and we combine multiple classifiers from different source projects to build a composite classifier that identifies SATD comments in a target project. We investigate the performance of our approach on 8 open source projects that contain 212,413 comments. Our experimental results show that, on every target project, our approach outperforms the state-of-the-art and the baselines approaches in terms of F1-score. The F1-score achieved by our approach ranges between 0.518 - 0.841, with an average of 0.737, which improves over the state-of-the-art approach proposed by Potdar and Shihab by 499.19%. When compared with the text mining-based baseline approaches, our approach significantly improves the average F1-score by at least 58.49%. When compared with a natural language processing-based baseline, our approach also significantly improves its F1-score by 27.95%. Our proposed approach can be used by project personnel to effectively identify SATD with minimal manual effort. |
format |
text |
author |
HUANG, Qiao SHIHAB, Emad XIA, Xin LO, David LI, Shanping |
author_facet |
HUANG, Qiao SHIHAB, Emad XIA, Xin LO, David LI, Shanping |
author_sort |
HUANG, Qiao |
title |
Identifying self-admitted technical debt in open source projects using text mining |
title_short |
Identifying self-admitted technical debt in open source projects using text mining |
title_full |
Identifying self-admitted technical debt in open source projects using text mining |
title_fullStr |
Identifying self-admitted technical debt in open source projects using text mining |
title_full_unstemmed |
Identifying self-admitted technical debt in open source projects using text mining |
title_sort |
identifying self-admitted technical debt in open source projects using text mining |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2018 |
url |
https://ink.library.smu.edu.sg/sis_research/3793 https://ink.library.smu.edu.sg/context/sis_research/article/4795/viewcontent/101007_2Fs10664_017_9522_4.pdf |
_version_ |
1770573735020986368 |