SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts
Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets a...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2018
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8866 https://ink.library.smu.edu.sg/context/sis_research/article/9869/viewcontent/msr18.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9869 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-98692024-06-13T09:02:18Z SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts BALTES, Sebastian DUMANI, Lorik TREUDE, Christoph DIEHL, Stephan Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on a first analysis using the dataset, we present insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text. Further, our analysis revealed a close relationship between post edits and comments. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub. 2018-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8866 info:doi/10.1145/3196398.3196430 https://ink.library.smu.edu.sg/context/sis_research/article/9869/viewcontent/msr18.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University code snippets open dataset software evolution stack overflow Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
code snippets open dataset software evolution stack overflow Software Engineering |
spellingShingle |
code snippets open dataset software evolution stack overflow Software Engineering BALTES, Sebastian DUMANI, Lorik TREUDE, Christoph DIEHL, Stephan SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts |
description |
Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on a first analysis using the dataset, we present insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text. Further, our analysis revealed a close relationship between post edits and comments. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub. |
format |
text |
author |
BALTES, Sebastian DUMANI, Lorik TREUDE, Christoph DIEHL, Stephan |
author_facet |
BALTES, Sebastian DUMANI, Lorik TREUDE, Christoph DIEHL, Stephan |
author_sort |
BALTES, Sebastian |
title |
SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts |
title_short |
SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts |
title_full |
SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts |
title_fullStr |
SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts |
title_full_unstemmed |
SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts |
title_sort |
sotorrent: reconstructing and analyzing the evolution of stack overflow posts |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2018 |
url |
https://ink.library.smu.edu.sg/sis_research/8866 https://ink.library.smu.edu.sg/context/sis_research/article/9869/viewcontent/msr18.pdf |
_version_ |
1814047601244241920 |