SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts

Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets a...

Full description

Saved in:
Bibliographic Details
Main Authors: BALTES, Sebastian, DUMANI, Lorik, TREUDE, Christoph, DIEHL, Stephan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8866
https://ink.library.smu.edu.sg/context/sis_research/article/9869/viewcontent/msr18.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9869
record_format dspace
spelling sg-smu-ink.sis_research-98692024-06-13T09:02:18Z SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts BALTES, Sebastian DUMANI, Lorik TREUDE, Christoph DIEHL, Stephan Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on a first analysis using the dataset, we present insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text. Further, our analysis revealed a close relationship between post edits and comments. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub. 2018-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8866 info:doi/10.1145/3196398.3196430 https://ink.library.smu.edu.sg/context/sis_research/article/9869/viewcontent/msr18.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University code snippets open dataset software evolution stack overflow Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic code snippets
open dataset
software evolution
stack overflow
Software Engineering
spellingShingle code snippets
open dataset
software evolution
stack overflow
Software Engineering
BALTES, Sebastian
DUMANI, Lorik
TREUDE, Christoph
DIEHL, Stephan
SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts
description Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on a first analysis using the dataset, we present insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text. Further, our analysis revealed a close relationship between post edits and comments. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub.
format text
author BALTES, Sebastian
DUMANI, Lorik
TREUDE, Christoph
DIEHL, Stephan
author_facet BALTES, Sebastian
DUMANI, Lorik
TREUDE, Christoph
DIEHL, Stephan
author_sort BALTES, Sebastian
title SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts
title_short SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts
title_full SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts
title_fullStr SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts
title_full_unstemmed SOTorrent: Reconstructing and analyzing the evolution of stack overflow posts
title_sort sotorrent: reconstructing and analyzing the evolution of stack overflow posts
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/8866
https://ink.library.smu.edu.sg/context/sis_research/article/9869/viewcontent/msr18.pdf
_version_ 1814047601244241920