SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets

Stack Overflow (SO) is the most popular questionand-answer website for software developers, providing a large amount of copyable code snippets. Like other software artifacts, code on SO evolves over time, for example when bugs are fixed or APIs are updated to the most recent version. To be able to a...

Full description

Saved in:
Bibliographic Details
Main Authors: BALTES, Sebastian, TREUDE, Christoph, DIEHL, Stephan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8837
https://ink.library.smu.edu.sg/context/sis_research/article/9840/viewcontent/msr19c.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9840
record_format dspace
spelling sg-smu-ink.sis_research-98402024-06-06T08:47:12Z SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets BALTES, Sebastian TREUDE, Christoph DIEHL, Stephan Stack Overflow (SO) is the most popular questionand-answer website for software developers, providing a large amount of copyable code snippets. Like other software artifacts, code on SO evolves over time, for example when bugs are fixed or APIs are updated to the most recent version. To be able to analyze how code and the surrounding text on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text and code blocks. It connects code snippets from SO posts to other platforms by aggregating URLs from surrounding text blocks and comments, and by collecting references from GitHub files to SO posts. Our vision is that researchers will use SOTorrent to investigate and understand the evolution and maintenance of code on SO and its relation to other platforms such as GitHub. 2019-05-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8837 info:doi/10.1109/MSR.2019.00038 https://ink.library.smu.edu.sg/context/sis_research/article/9840/viewcontent/msr19c.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code snippets Github Open dataset Software evolution Stack overflow Programming Languages and Compilers Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Code snippets
Github
Open dataset
Software evolution
Stack overflow
Programming Languages and Compilers
Software Engineering
spellingShingle Code snippets
Github
Open dataset
Software evolution
Stack overflow
Programming Languages and Compilers
Software Engineering
BALTES, Sebastian
TREUDE, Christoph
DIEHL, Stephan
SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets
description Stack Overflow (SO) is the most popular questionand-answer website for software developers, providing a large amount of copyable code snippets. Like other software artifacts, code on SO evolves over time, for example when bugs are fixed or APIs are updated to the most recent version. To be able to analyze how code and the surrounding text on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text and code blocks. It connects code snippets from SO posts to other platforms by aggregating URLs from surrounding text blocks and comments, and by collecting references from GitHub files to SO posts. Our vision is that researchers will use SOTorrent to investigate and understand the evolution and maintenance of code on SO and its relation to other platforms such as GitHub.
format text
author BALTES, Sebastian
TREUDE, Christoph
DIEHL, Stephan
author_facet BALTES, Sebastian
TREUDE, Christoph
DIEHL, Stephan
author_sort BALTES, Sebastian
title SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets
title_short SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets
title_full SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets
title_fullStr SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets
title_full_unstemmed SOTorrent: Studying the origin, evolution, and usage of stack overflow code snippets
title_sort sotorrent: studying the origin, evolution, and usage of stack overflow code snippets
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/8837
https://ink.library.smu.edu.sg/context/sis_research/article/9840/viewcontent/msr19c.pdf
_version_ 1814047570652037120