Automatic query reformulation for code search using crowdsourced knowledge

Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing a...

Full description

Saved in:
Bibliographic Details
Main Authors: RAHMAN, Mohammad M., ROY, Chanchal K., LO, David
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4374
https://ink.library.smu.edu.sg/context/sis_research/article/5377/viewcontent/Rahman2019_Article_AutomaticQueryReformulationFor_1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5377
record_format dspace
spelling sg-smu-ink.sis_research-53772020-03-31T06:20:57Z Automatic query reformulation for code search using crowdsourced knowledge RAHMAN, Mohammad M. ROY, Chanchal K. LO, David Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search. 2019-01-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4374 info:doi/10.1007/s10664-018-9671-0 https://ink.library.smu.edu.sg/context/sis_research/article/5377/viewcontent/Rahman2019_Article_AutomaticQueryReformulationFor_1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code search Keyword-API association Crowdsourced knowledge Stack Overflow Query reformulation Programming Languages and Compilers Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Code search
Keyword-API association
Crowdsourced knowledge
Stack Overflow
Query reformulation
Programming Languages and Compilers
Software Engineering
spellingShingle Code search
Keyword-API association
Crowdsourced knowledge
Stack Overflow
Query reformulation
Programming Languages and Compilers
Software Engineering
RAHMAN, Mohammad M.
ROY, Chanchal K.
LO, David
Automatic query reformulation for code search using crowdsourced knowledge
description Traditional code search engines (e.g., Krugle) often do not perform well with natural language queries. They mostly apply keyword matching between query and source code. Hence, they need carefully designed queries containing references to relevant APIs for the code search. Unfortunately, preparing an effective search query is not only challenging but also time-consuming for the developers according to existing studies. In this article, we propose a novel query reformulation technique–RACK–that suggests a list of relevant API classes for a natural language query intended for code search. Our technique offers such suggestions by exploiting keyword-API associations from the questions and answers of Stack Overflow (i.e., crowdsourced knowledge). We first motivate our idea using an exploratory study with 19 standard Java API packages and 344K Java related posts from Stack Overflow. Experiments using 175 code search queries randomly chosen from three Java tutorial sites show that our technique recommends correct API classes within the Top-10 results for 83% of the queries, with 46% mean average precision and 54% recall, which are 66%, 79% and 87% higher respectively than that of the state-of-the-art. Reformulations using our suggested API classes improve 64% of the natural language queries and their overall accuracy improves by 19%. Comparisons with three state-of-the-art techniques demonstrate that RACK outperforms them in the query reformulation by a statistically significant margin. Investigation using three web/code search engines shows that our technique can significantly improve their results in the context of code search.
format text
author RAHMAN, Mohammad M.
ROY, Chanchal K.
LO, David
author_facet RAHMAN, Mohammad M.
ROY, Chanchal K.
LO, David
author_sort RAHMAN, Mohammad M.
title Automatic query reformulation for code search using crowdsourced knowledge
title_short Automatic query reformulation for code search using crowdsourced knowledge
title_full Automatic query reformulation for code search using crowdsourced knowledge
title_fullStr Automatic query reformulation for code search using crowdsourced knowledge
title_full_unstemmed Automatic query reformulation for code search using crowdsourced knowledge
title_sort automatic query reformulation for code search using crowdsourced knowledge
publisher Institutional Knowledge at Singapore Management University
publishDate 2019
url https://ink.library.smu.edu.sg/sis_research/4374
https://ink.library.smu.edu.sg/context/sis_research/article/5377/viewcontent/Rahman2019_Article_AutomaticQueryReformulationFor_1_.pdf
_version_ 1770574690775990272