GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search
Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers...
Saved in:
Main Authors: | , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172092 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172092 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1720922023-11-22T03:45:19Z GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search Liu, Shangqing Xie, Xiaofei Siow, Jingkai Ma, Lei Meng, Guozhu Liu, Yang School of Computer Science and Engineering Engineering::Computer science and engineering Code Search Graph Neural Networks Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers. In addition, Deep learning has recently been widely applied to different code-related scenarios, e.g., vulnerability detection, source code summarization. However, automated deep code search is still challenging since it requires a high-level semantic mapping between code and natural language queries. Most existing deep learning-based approaches for code search rely on the sequential text i.e., feeding the program and the query as a flat sequence of tokens to learn the program semantics while the structural information is not fully considered. Furthermore, the widely adopted Graph Neural Networks (GNNs) have proved their effectiveness in learning program semantics, however, they also suffer the problem of capturing the global dependencies in the constructed graph, which limits the model learning capacity. To address these challenges, in this paper, we design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search by jointly learning the rich semantics of both source code and natural language queries. Specifically, we propose to construct graphs for the source code and queries with bidirectional GGNN (BiGGNN) to capture the local structural information of the source code and queries. Furthermore, we enhance BiGGNN by utilizing the multi-head attention module to supplement the global dependencies that BiGGNN missed to improve the model learning capacity. The extensive experiments on Java and Python programming language from the public benchmark CodeSearchNet confirm that GraphSearchNet outperforms current state-of-the-art works by a significant margin. Ministry of Education (MOE) National Research Foundation (NRF) This work was supported in part by the National Research Foundation, Singapore, through AI Singapore Programme under Grant AISG2-RP-2020-019, in part by the National Research Foundation, Prime Ministers Office, Singapore, through National Cybersecurity R&D Program under Award NRF2018NCR-NCR005- 0001, in part by NRF Investigatorship under Grant NRF-NRFI06-2020-0001, in part by the National Research Foundation through National Satellite of Excellence in Trustworthy Software Systems (NSOE-TSS) Project under National Cybersecurity R&D (NCR) Grant NRF2018NCR-NSOE003-0001, in part by the Ministry of Education, Singapore, through Academic Research Tier 3 under Grant MOET32020-0004. The work of Guozhu Meng was supported in part by NSFC under Grant 61902395, in part by Beijing Nova Program, and in part by Alibaba Innovation Research. 2023-11-22T03:09:26Z 2023-11-22T03:09:26Z 2023 Journal Article Liu, S., Xie, X., Siow, J., Ma, L., Meng, G. & Liu, Y. (2023). GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search. IEEE Transactions On Software Engineering, 49(4), 2839-2855. https://dx.doi.org/10.1109/TSE.2022.3233901 0098-5589 https://hdl.handle.net/10356/172092 10.1109/TSE.2022.3233901 2-s2.0-85147215823 4 49 2839 2855 en AISG2-RP-2020-019 NRF2018NCR-NCR005-0001 NRF-NRFI06-2020-0001 NRF2018NCR-NSOE003-0001 MOET32020-0004 IEEE Transactions on Software Engineering © 2023 IEEE. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Code Search Graph Neural Networks |
spellingShingle |
Engineering::Computer science and engineering Code Search Graph Neural Networks Liu, Shangqing Xie, Xiaofei Siow, Jingkai Ma, Lei Meng, Guozhu Liu, Yang GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search |
description |
Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers. In addition, Deep learning has recently been widely applied to different code-related scenarios, e.g., vulnerability detection, source code summarization. However, automated deep code search is still challenging since it requires a high-level semantic mapping between code and natural language queries. Most existing deep learning-based approaches for code search rely on the sequential text i.e., feeding the program and the query as a flat sequence of tokens to learn the program semantics while the structural information is not fully considered. Furthermore, the widely adopted Graph Neural Networks (GNNs) have proved their effectiveness in learning program semantics, however, they also suffer the problem of capturing the global dependencies in the constructed graph, which limits the model learning capacity. To address these challenges, in this paper, we design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search by jointly learning the rich semantics of both source code and natural language queries. Specifically, we propose to construct graphs for the source code and queries with bidirectional GGNN (BiGGNN) to capture the local structural information of the source code and queries. Furthermore, we enhance BiGGNN by utilizing the multi-head attention module to supplement the global dependencies that BiGGNN missed to improve the model learning capacity. The extensive experiments on Java and Python programming language from the public benchmark CodeSearchNet confirm that GraphSearchNet outperforms current state-of-the-art works by a significant margin. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Liu, Shangqing Xie, Xiaofei Siow, Jingkai Ma, Lei Meng, Guozhu Liu, Yang |
format |
Article |
author |
Liu, Shangqing Xie, Xiaofei Siow, Jingkai Ma, Lei Meng, Guozhu Liu, Yang |
author_sort |
Liu, Shangqing |
title |
GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search |
title_short |
GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search |
title_full |
GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search |
title_fullStr |
GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search |
title_full_unstemmed |
GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search |
title_sort |
graphsearchnet: enhancing gnns via capturing global dependencies for semantic code search |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172092 |
_version_ |
1783955554120499200 |