Augmenting and structuring user queries to support efficient free-form code search

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch pro...

Full description

Saved in:
Bibliographic Details
Main Authors: SIRRES, Raphael, BISSYANDE, Tegawendé F., KIM, Dongsun, LO, David, KLEIN, Jacques, KIM, Kisub, TRAON, Yves Le
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4129
https://ink.library.smu.edu.sg/context/sis_research/article/5132/viewcontent/Augmenting_and_structuring_user_queries_to_support.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5132
record_format dspace
spelling sg-smu-ink.sis_research-51322020-01-20T03:26:42Z Augmenting and structuring user queries to support efficient free-form code search SIRRES, Raphael BISSYANDE, Tegawendé F. KIM, Dongsun LO, David KLEIN, Jacques KIM, Kisub TRAON, Yves Le Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and Stack Overflow Q&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for Stack Overflow questions. 2018-10-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4129 info:doi/10.1007/s10664-017-9544-y https://ink.library.smu.edu.sg/context/sis_research/article/5132/viewcontent/Augmenting_and_structuring_user_queries_to_support.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code search GitHub Free-form search Query augmentation StackOverflow Vocabulary mismatch Computer Engineering Programming Languages and Compilers Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Code search
GitHub
Free-form search
Query augmentation
StackOverflow
Vocabulary mismatch
Computer Engineering
Programming Languages and Compilers
Software Engineering
spellingShingle Code search
GitHub
Free-form search
Query augmentation
StackOverflow
Vocabulary mismatch
Computer Engineering
Programming Languages and Compilers
Software Engineering
SIRRES, Raphael
BISSYANDE, Tegawendé F.
KIM, Dongsun
LO, David
KLEIN, Jacques
KIM, Kisub
TRAON, Yves Le
Augmenting and structuring user queries to support efficient free-form code search
description Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and Stack Overflow Q&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for Stack Overflow questions.
format text
author SIRRES, Raphael
BISSYANDE, Tegawendé F.
KIM, Dongsun
LO, David
KLEIN, Jacques
KIM, Kisub
TRAON, Yves Le
author_facet SIRRES, Raphael
BISSYANDE, Tegawendé F.
KIM, Dongsun
LO, David
KLEIN, Jacques
KIM, Kisub
TRAON, Yves Le
author_sort SIRRES, Raphael
title Augmenting and structuring user queries to support efficient free-form code search
title_short Augmenting and structuring user queries to support efficient free-form code search
title_full Augmenting and structuring user queries to support efficient free-form code search
title_fullStr Augmenting and structuring user queries to support efficient free-form code search
title_full_unstemmed Augmenting and structuring user queries to support efficient free-form code search
title_sort augmenting and structuring user queries to support efficient free-form code search
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/4129
https://ink.library.smu.edu.sg/context/sis_research/article/5132/viewcontent/Augmenting_and_structuring_user_queries_to_support.pdf
_version_ 1770574345321578496