GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search

Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu, Shangqing, Xie, Xiaofei, Siow, Jingkai, Ma, Lei, Meng, Guozhu, Liu, Yang
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172092
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172092
record_format dspace
spelling sg-ntu-dr.10356-1720922023-11-22T03:45:19Z GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search Liu, Shangqing Xie, Xiaofei Siow, Jingkai Ma, Lei Meng, Guozhu Liu, Yang School of Computer Science and Engineering Engineering::Computer science and engineering Code Search Graph Neural Networks Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers. In addition, Deep learning has recently been widely applied to different code-related scenarios, e.g., vulnerability detection, source code summarization. However, automated deep code search is still challenging since it requires a high-level semantic mapping between code and natural language queries. Most existing deep learning-based approaches for code search rely on the sequential text i.e., feeding the program and the query as a flat sequence of tokens to learn the program semantics while the structural information is not fully considered. Furthermore, the widely adopted Graph Neural Networks (GNNs) have proved their effectiveness in learning program semantics, however, they also suffer the problem of capturing the global dependencies in the constructed graph, which limits the model learning capacity. To address these challenges, in this paper, we design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search by jointly learning the rich semantics of both source code and natural language queries. Specifically, we propose to construct graphs for the source code and queries with bidirectional GGNN (BiGGNN) to capture the local structural information of the source code and queries. Furthermore, we enhance BiGGNN by utilizing the multi-head attention module to supplement the global dependencies that BiGGNN missed to improve the model learning capacity. The extensive experiments on Java and Python programming language from the public benchmark CodeSearchNet confirm that GraphSearchNet outperforms current state-of-the-art works by a significant margin. Ministry of Education (MOE) National Research Foundation (NRF) This work was supported in part by the National Research Foundation, Singapore, through AI Singapore Programme under Grant AISG2-RP-2020-019, in part by the National Research Foundation, Prime Ministers Office, Singapore, through National Cybersecurity R&D Program under Award NRF2018NCR-NCR005- 0001, in part by NRF Investigatorship under Grant NRF-NRFI06-2020-0001, in part by the National Research Foundation through National Satellite of Excellence in Trustworthy Software Systems (NSOE-TSS) Project under National Cybersecurity R&D (NCR) Grant NRF2018NCR-NSOE003-0001, in part by the Ministry of Education, Singapore, through Academic Research Tier 3 under Grant MOET32020-0004. The work of Guozhu Meng was supported in part by NSFC under Grant 61902395, in part by Beijing Nova Program, and in part by Alibaba Innovation Research. 2023-11-22T03:09:26Z 2023-11-22T03:09:26Z 2023 Journal Article Liu, S., Xie, X., Siow, J., Ma, L., Meng, G. & Liu, Y. (2023). GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search. IEEE Transactions On Software Engineering, 49(4), 2839-2855. https://dx.doi.org/10.1109/TSE.2022.3233901 0098-5589 https://hdl.handle.net/10356/172092 10.1109/TSE.2022.3233901 2-s2.0-85147215823 4 49 2839 2855 en AISG2-RP-2020-019 NRF2018NCR-NCR005-0001 NRF-NRFI06-2020-0001 NRF2018NCR-NSOE003-0001 MOET32020-0004 IEEE Transactions on Software Engineering © 2023 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Code Search
Graph Neural Networks
spellingShingle Engineering::Computer science and engineering
Code Search
Graph Neural Networks
Liu, Shangqing
Xie, Xiaofei
Siow, Jingkai
Ma, Lei
Meng, Guozhu
Liu, Yang
GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search
description Code search aims to retrieve accurate code snippets based on a natural language query to improve software productivity and quality. With the massive amount of available programs such as (on GitHub or Stack Overflow), identifying and localizing the precise code is critical for the software developers. In addition, Deep learning has recently been widely applied to different code-related scenarios, e.g., vulnerability detection, source code summarization. However, automated deep code search is still challenging since it requires a high-level semantic mapping between code and natural language queries. Most existing deep learning-based approaches for code search rely on the sequential text i.e., feeding the program and the query as a flat sequence of tokens to learn the program semantics while the structural information is not fully considered. Furthermore, the widely adopted Graph Neural Networks (GNNs) have proved their effectiveness in learning program semantics, however, they also suffer the problem of capturing the global dependencies in the constructed graph, which limits the model learning capacity. To address these challenges, in this paper, we design a novel neural network framework, named GraphSearchNet, to enable an effective and accurate source code search by jointly learning the rich semantics of both source code and natural language queries. Specifically, we propose to construct graphs for the source code and queries with bidirectional GGNN (BiGGNN) to capture the local structural information of the source code and queries. Furthermore, we enhance BiGGNN by utilizing the multi-head attention module to supplement the global dependencies that BiGGNN missed to improve the model learning capacity. The extensive experiments on Java and Python programming language from the public benchmark CodeSearchNet confirm that GraphSearchNet outperforms current state-of-the-art works by a significant margin.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Liu, Shangqing
Xie, Xiaofei
Siow, Jingkai
Ma, Lei
Meng, Guozhu
Liu, Yang
format Article
author Liu, Shangqing
Xie, Xiaofei
Siow, Jingkai
Ma, Lei
Meng, Guozhu
Liu, Yang
author_sort Liu, Shangqing
title GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search
title_short GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search
title_full GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search
title_fullStr GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search
title_full_unstemmed GraphSearchNet: enhancing GNNs via capturing global dependencies for semantic code search
title_sort graphsearchnet: enhancing gnns via capturing global dependencies for semantic code search
publishDate 2023
url https://hdl.handle.net/10356/172092
_version_ 1783955554120499200