18 million links in commit messages: purpose, evolution, and decay

Commit messages contain diverse and valuable types of knowledge in all aspects of software maintenance and evolution. Links are an example of such knowledge. Previous work on “9.6 million links in source code comments” showed that links are prone to decay, become outdated, and lack bidirectional tra...

Full description

Saved in:
Bibliographic Details
Main Authors: XIAO, Tao, BALTES, Sebastian, HATA, Hideaki, TREUDE, Christoph, KULA, Raula, ISHIO, Takashi, MATSUMOTO, Kenichi
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8781
https://ink.library.smu.edu.sg/context/sis_research/article/9784/viewcontent/18million.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9784
record_format dspace
spelling sg-smu-ink.sis_research-97842024-05-30T08:58:32Z 18 million links in commit messages: purpose, evolution, and decay XIAO, Tao BALTES, Sebastian HATA, Hideaki TREUDE, Christoph KULA, Raula ISHIO, Takashi MATSUMOTO, Kenichi Commit messages contain diverse and valuable types of knowledge in all aspects of software maintenance and evolution. Links are an example of such knowledge. Previous work on “9.6 million links in source code comments” showed that links are prone to decay, become outdated, and lack bidirectional traceability. We conducted a large-scale study of 18,201,165 links from commits in 23,110 GitHub repositories to investigate whether they suffer the same fate. Results show that referencing external resources is prevalent and that the most frequent domains other than github.com are the external domains of Stack Overflow and Google Code. Similarly, links serve as source code context to commit messages, with inaccessible links being frequent. Although repeatedly referencing links is rare (4%), 14% of links that are prone to evolve become unavailable over time; e.g., tutorials or articles and software homepages become unavailable over time. Furthermore, we find that 70% of the distinct links suffer from decay; the domains that occur the most frequently are related to Subversion repositories. We summarize that links in commits share the same fate as links in code, opening up avenues for future work. 2023-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8781 info:doi/10.1007/s10664-023-10325-8 https://ink.library.smu.edu.sg/context/sis_research/article/9784/viewcontent/18million.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Commit messages Software documentation Link sharing Link decay Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Commit messages
Software documentation
Link sharing
Link decay
Software Engineering
spellingShingle Commit messages
Software documentation
Link sharing
Link decay
Software Engineering
XIAO, Tao
BALTES, Sebastian
HATA, Hideaki
TREUDE, Christoph
KULA, Raula
ISHIO, Takashi
MATSUMOTO, Kenichi
18 million links in commit messages: purpose, evolution, and decay
description Commit messages contain diverse and valuable types of knowledge in all aspects of software maintenance and evolution. Links are an example of such knowledge. Previous work on “9.6 million links in source code comments” showed that links are prone to decay, become outdated, and lack bidirectional traceability. We conducted a large-scale study of 18,201,165 links from commits in 23,110 GitHub repositories to investigate whether they suffer the same fate. Results show that referencing external resources is prevalent and that the most frequent domains other than github.com are the external domains of Stack Overflow and Google Code. Similarly, links serve as source code context to commit messages, with inaccessible links being frequent. Although repeatedly referencing links is rare (4%), 14% of links that are prone to evolve become unavailable over time; e.g., tutorials or articles and software homepages become unavailable over time. Furthermore, we find that 70% of the distinct links suffer from decay; the domains that occur the most frequently are related to Subversion repositories. We summarize that links in commits share the same fate as links in code, opening up avenues for future work.
format text
author XIAO, Tao
BALTES, Sebastian
HATA, Hideaki
TREUDE, Christoph
KULA, Raula
ISHIO, Takashi
MATSUMOTO, Kenichi
author_facet XIAO, Tao
BALTES, Sebastian
HATA, Hideaki
TREUDE, Christoph
KULA, Raula
ISHIO, Takashi
MATSUMOTO, Kenichi
author_sort XIAO, Tao
title 18 million links in commit messages: purpose, evolution, and decay
title_short 18 million links in commit messages: purpose, evolution, and decay
title_full 18 million links in commit messages: purpose, evolution, and decay
title_fullStr 18 million links in commit messages: purpose, evolution, and decay
title_full_unstemmed 18 million links in commit messages: purpose, evolution, and decay
title_sort 18 million links in commit messages: purpose, evolution, and decay
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8781
https://ink.library.smu.edu.sg/context/sis_research/article/9784/viewcontent/18million.pdf
_version_ 1814047528456290304