KAPE: kNN-based performance testing for deep code search

Code search is a common yet important activity of software developers. An efficient code search model can largely facilitate the development process and improve the programming quality. Given the superb performance of learning the contextual representations, deep learning models, especially pre-trai...

Full description

Saved in:

Bibliographic Details
Main Authors:	GUO, Yuejun, HU, Qiang, XIE, Xiaofei, MAXIME, Cordy, PAPADAKIS, Mike, LE TRAON, Yves
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2023
Subjects:	Deep code search software testing deep learning testing test selection Programming Languages and Compilers Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/9093 https://ink.library.smu.edu.sg/context/sis_research/article/10096/viewcontent/3624735.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-10096
record_format	dspace
spelling	sg-smu-ink.sis_research-100962024-08-01T15:09:47Z KAPE: kNN-based performance testing for deep code search GUO, Yuejun HU, Qiang XIE, Xiaofei MAXIME, Cordy PAPADAKIS, Mike LE TRAON, Yves Code search is a common yet important activity of software developers. An efficient code search model can largely facilitate the development process and improve the programming quality. Given the superb performance of learning the contextual representations, deep learning models, especially pre-trained language models, have been widely explored for the code search task. However, studies mainly focus on proposing new architectures for ever-better performance on designed test sets but ignore the performance on unseen test data where only natural language queries are available. The same problem in other domains, e.g., CV and NLP, is usually solved by test input selection that uses a subset of the unseen set to reduce the labeling effort. However, approaches from other domains are not directly applicable and still require labeling effort. In this article, we propose the kNN-based performance testing (KAPE) to efficiently solve the problem without manually matching code snippets to test queries. The main idea is to use semantically similar training data to perform the evaluation. Extensive experiments on six programming language datasets, three state-of-the-art pre-trained models, and seven baseline methods demonstrate that KAPE can effectively assess the model performance (e.g., CodeBERT achieves MRR 0.5795 on JavaScript) with a slight difference (e.g., 0.0261). 2023-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/9093 info:doi/10.1145/3624735 https://ink.library.smu.edu.sg/context/sis_research/article/10096/viewcontent/3624735.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Deep code search software testing deep learning testing test selection Programming Languages and Compilers Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Deep code search software testing deep learning testing test selection Programming Languages and Compilers Software Engineering
spellingShingle	Deep code search software testing deep learning testing test selection Programming Languages and Compilers Software Engineering GUO, Yuejun HU, Qiang XIE, Xiaofei MAXIME, Cordy PAPADAKIS, Mike LE TRAON, Yves KAPE: kNN-based performance testing for deep code search
description	Code search is a common yet important activity of software developers. An efficient code search model can largely facilitate the development process and improve the programming quality. Given the superb performance of learning the contextual representations, deep learning models, especially pre-trained language models, have been widely explored for the code search task. However, studies mainly focus on proposing new architectures for ever-better performance on designed test sets but ignore the performance on unseen test data where only natural language queries are available. The same problem in other domains, e.g., CV and NLP, is usually solved by test input selection that uses a subset of the unseen set to reduce the labeling effort. However, approaches from other domains are not directly applicable and still require labeling effort. In this article, we propose the kNN-based performance testing (KAPE) to efficiently solve the problem without manually matching code snippets to test queries. The main idea is to use semantically similar training data to perform the evaluation. Extensive experiments on six programming language datasets, three state-of-the-art pre-trained models, and seven baseline methods demonstrate that KAPE can effectively assess the model performance (e.g., CodeBERT achieves MRR 0.5795 on JavaScript) with a slight difference (e.g., 0.0261).
format	text
author	GUO, Yuejun HU, Qiang XIE, Xiaofei MAXIME, Cordy PAPADAKIS, Mike LE TRAON, Yves
author_facet	GUO, Yuejun HU, Qiang XIE, Xiaofei MAXIME, Cordy PAPADAKIS, Mike LE TRAON, Yves
author_sort	GUO, Yuejun
title	KAPE: kNN-based performance testing for deep code search
title_short	KAPE: kNN-based performance testing for deep code search
title_full	KAPE: kNN-based performance testing for deep code search
title_fullStr	KAPE: kNN-based performance testing for deep code search
title_full_unstemmed	KAPE: kNN-based performance testing for deep code search
title_sort	kape: knn-based performance testing for deep code search
publisher	Institutional Knowledge at Singapore Management University
publishDate	2023
url	https://ink.library.smu.edu.sg/sis_research/9093 https://ink.library.smu.edu.sg/context/sis_research/article/10096/viewcontent/3624735.pdf
_version_	1814047729295294464

KAPE: kNN-based performance testing for deep code search

Similar Items