Characterizing search activities on stack overflow

To solve programming issues, developers commonly search on Stack Overflow to seek potential solutions. However, there is a gap between the knowledge developers are interested in and the knowledge they are able to retrieve using search engines. To help developers efficiently retrieve relevant knowled...

Full description

Saved in:
Bibliographic Details
Main Authors: LIU, Jiakun, BALTES, Sebastian, TREUDE, Christoph, LO, David, ZHANG, Yun, XIA, Xin
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6891
https://ink.library.smu.edu.sg/context/sis_research/article/7894/viewcontent/Characterizing_Search_Activities_on_Stack_Overflow__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7894
record_format dspace
spelling sg-smu-ink.sis_research-78942022-02-07T10:55:41Z Characterizing search activities on stack overflow LIU, Jiakun BALTES, Sebastian TREUDE, Christoph LO, David ZHANG, Yun XIA, Xin To solve programming issues, developers commonly search on Stack Overflow to seek potential solutions. However, there is a gap between the knowledge developers are interested in and the knowledge they are able to retrieve using search engines. To help developers efficiently retrieve relevant knowledge on Stack Overflow, prior studies proposed several techniques to reformulate queries and generate summarized answers. However, few studies performed a large-scale analysis using real-world search logs. In this paper, we characterize how developers search on Stack Overflow using such logs. By doing so, we identify the challenges developers face when searching on Stack Overflow and seek opportunities for the platform and researchers to help developers efficiently retrieve knowledge. To characterize search activities on Stack Overflow, we use search log data based on requests to Stack Overflow's web servers. We find that the most common search activity is reformulating the immediately preceding queries. Related work looked into query reformulations when using generic search engines and found 13 types of query reformulation strategies. Compared to their results, we observe that 71.78% of the reformulations can be fitted into those reformulation strategies. In terms of how queries are structured, 17.41% of the search sessions only search for fragments of source code artifacts (e.g., class and method names) without specifying the names of programming languages, libraries, or frameworks. Based on our findings, we provide actionable suggestions for Stack Overflow moderators and outline directions for future research. For example, we encourage Stack Overflow to set up a database that includes the relations between all computer programming terminologies shared on Stack Overflow, e.g., method name, data structure name, design pattern, and IDE name. By doing so, Stack Overflow could improve the performance of search engines by considering related programming terminologies at different levels of granularity. 2021-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6891 info:doi/10.1145/3468264.3468582 https://ink.library.smu.edu.sg/context/sis_research/article/7894/viewcontent/Characterizing_Search_Activities_on_Stack_Overflow__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Data mining Query logs Query reformulation Stack overflow Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Data mining
Query logs
Query reformulation
Stack overflow
Databases and Information Systems
spellingShingle Data mining
Query logs
Query reformulation
Stack overflow
Databases and Information Systems
LIU, Jiakun
BALTES, Sebastian
TREUDE, Christoph
LO, David
ZHANG, Yun
XIA, Xin
Characterizing search activities on stack overflow
description To solve programming issues, developers commonly search on Stack Overflow to seek potential solutions. However, there is a gap between the knowledge developers are interested in and the knowledge they are able to retrieve using search engines. To help developers efficiently retrieve relevant knowledge on Stack Overflow, prior studies proposed several techniques to reformulate queries and generate summarized answers. However, few studies performed a large-scale analysis using real-world search logs. In this paper, we characterize how developers search on Stack Overflow using such logs. By doing so, we identify the challenges developers face when searching on Stack Overflow and seek opportunities for the platform and researchers to help developers efficiently retrieve knowledge. To characterize search activities on Stack Overflow, we use search log data based on requests to Stack Overflow's web servers. We find that the most common search activity is reformulating the immediately preceding queries. Related work looked into query reformulations when using generic search engines and found 13 types of query reformulation strategies. Compared to their results, we observe that 71.78% of the reformulations can be fitted into those reformulation strategies. In terms of how queries are structured, 17.41% of the search sessions only search for fragments of source code artifacts (e.g., class and method names) without specifying the names of programming languages, libraries, or frameworks. Based on our findings, we provide actionable suggestions for Stack Overflow moderators and outline directions for future research. For example, we encourage Stack Overflow to set up a database that includes the relations between all computer programming terminologies shared on Stack Overflow, e.g., method name, data structure name, design pattern, and IDE name. By doing so, Stack Overflow could improve the performance of search engines by considering related programming terminologies at different levels of granularity.
format text
author LIU, Jiakun
BALTES, Sebastian
TREUDE, Christoph
LO, David
ZHANG, Yun
XIA, Xin
author_facet LIU, Jiakun
BALTES, Sebastian
TREUDE, Christoph
LO, David
ZHANG, Yun
XIA, Xin
author_sort LIU, Jiakun
title Characterizing search activities on stack overflow
title_short Characterizing search activities on stack overflow
title_full Characterizing search activities on stack overflow
title_fullStr Characterizing search activities on stack overflow
title_full_unstemmed Characterizing search activities on stack overflow
title_sort characterizing search activities on stack overflow
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6891
https://ink.library.smu.edu.sg/context/sis_research/article/7894/viewcontent/Characterizing_Search_Activities_on_Stack_Overflow__1_.pdf
_version_ 1770576114494734336