Identifying self-admitted technical debt in open source projects using text mining

Technical debt is a metaphor to describe the situation in which long-term code quality is traded for short-term goals in software projects. Recently, the concept of self-admitted technical debt (SATD) was proposed, which considers debt that is intentionally introduced, e.g., in the form of quick or...

Full description

Saved in:
Bibliographic Details
Main Authors: HUANG, Qiao, SHIHAB, Emad, XIA, Xin, LO, David, LI, Shanping
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/3793
https://ink.library.smu.edu.sg/context/sis_research/article/4795/viewcontent/101007_2Fs10664_017_9522_4.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-4795
record_format dspace
spelling sg-smu-ink.sis_research-47952020-01-14T02:33:07Z Identifying self-admitted technical debt in open source projects using text mining HUANG, Qiao SHIHAB, Emad XIA, Xin LO, David LI, Shanping Technical debt is a metaphor to describe the situation in which long-term code quality is traded for short-term goals in software projects. Recently, the concept of self-admitted technical debt (SATD) was proposed, which considers debt that is intentionally introduced, e.g., in the form of quick or temporary fixes. Prior work on SATD has shown that source code comments can be used to successfully detect SATD, however, most current state-of-the-art classification approaches of SATD rely on manual inspection of the source code comments. In this paper, we proposed an automated approach to detect SATD in source code comments using text mining. In our approach, we utilize feature selection to select useful features for classifier training, and we combine multiple classifiers from different source projects to build a composite classifier that identifies SATD comments in a target project. We investigate the performance of our approach on 8 open source projects that contain 212,413 comments. Our experimental results show that, on every target project, our approach outperforms the state-of-the-art and the baselines approaches in terms of F1-score. The F1-score achieved by our approach ranges between 0.518 - 0.841, with an average of 0.737, which improves over the state-of-the-art approach proposed by Potdar and Shihab by 499.19%. When compared with the text mining-based baseline approaches, our approach significantly improves the average F1-score by at least 58.49%. When compared with a natural language processing-based baseline, our approach also significantly improves its F1-score by 27.95%. Our proposed approach can be used by project personnel to effectively identify SATD with minimal manual effort. 2018-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/3793 info:doi/10.1007/s10664-017-9522-4 https://ink.library.smu.edu.sg/context/sis_research/article/4795/viewcontent/101007_2Fs10664_017_9522_4.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Source code comments Technical debt Text mininga Databases and Information Systems Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Source code comments
Technical debt
Text mininga
Databases and Information Systems
Software Engineering
spellingShingle Source code comments
Technical debt
Text mininga
Databases and Information Systems
Software Engineering
HUANG, Qiao
SHIHAB, Emad
XIA, Xin
LO, David
LI, Shanping
Identifying self-admitted technical debt in open source projects using text mining
description Technical debt is a metaphor to describe the situation in which long-term code quality is traded for short-term goals in software projects. Recently, the concept of self-admitted technical debt (SATD) was proposed, which considers debt that is intentionally introduced, e.g., in the form of quick or temporary fixes. Prior work on SATD has shown that source code comments can be used to successfully detect SATD, however, most current state-of-the-art classification approaches of SATD rely on manual inspection of the source code comments. In this paper, we proposed an automated approach to detect SATD in source code comments using text mining. In our approach, we utilize feature selection to select useful features for classifier training, and we combine multiple classifiers from different source projects to build a composite classifier that identifies SATD comments in a target project. We investigate the performance of our approach on 8 open source projects that contain 212,413 comments. Our experimental results show that, on every target project, our approach outperforms the state-of-the-art and the baselines approaches in terms of F1-score. The F1-score achieved by our approach ranges between 0.518 - 0.841, with an average of 0.737, which improves over the state-of-the-art approach proposed by Potdar and Shihab by 499.19%. When compared with the text mining-based baseline approaches, our approach significantly improves the average F1-score by at least 58.49%. When compared with a natural language processing-based baseline, our approach also significantly improves its F1-score by 27.95%. Our proposed approach can be used by project personnel to effectively identify SATD with minimal manual effort.
format text
author HUANG, Qiao
SHIHAB, Emad
XIA, Xin
LO, David
LI, Shanping
author_facet HUANG, Qiao
SHIHAB, Emad
XIA, Xin
LO, David
LI, Shanping
author_sort HUANG, Qiao
title Identifying self-admitted technical debt in open source projects using text mining
title_short Identifying self-admitted technical debt in open source projects using text mining
title_full Identifying self-admitted technical debt in open source projects using text mining
title_fullStr Identifying self-admitted technical debt in open source projects using text mining
title_full_unstemmed Identifying self-admitted technical debt in open source projects using text mining
title_sort identifying self-admitted technical debt in open source projects using text mining
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/3793
https://ink.library.smu.edu.sg/context/sis_research/article/4795/viewcontent/101007_2Fs10664_017_9522_4.pdf
_version_ 1770573735020986368