Hierarchical semantic-aware neural code representation

Code representation is a fundamental problem in many software engineering tasks. Despite the effort made by many researchers, it is still hard for existing methods to fully extract syntactic, structural and sequential features of source code, which form the hierarchical semantics of the program and...

Full description

Saved in:

Bibliographic Details
Main Authors:	JIANG, Yuan, SU, Xiaohong, TREUDE, Christoph, WANG, Tiantian
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2022
Subjects:	Code representation Graph-LSTM Hierarchical semantics Program classification Clone detection Vulnerability detection Deep learning Databases and Information Systems Graphics and Human Computer Interfaces Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/8766 https://ink.library.smu.edu.sg/context/sis_research/article/9769/viewcontent/jss22.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9769
record_format	dspace
spelling	sg-smu-ink.sis_research-97692024-05-23T05:39:37Z Hierarchical semantic-aware neural code representation JIANG, Yuan SU, Xiaohong TREUDE, Christoph WANG, Tiantian Code representation is a fundamental problem in many software engineering tasks. Despite the effort made by many researchers, it is still hard for existing methods to fully extract syntactic, structural and sequential features of source code, which form the hierarchical semantics of the program and are necessary to achieve a deeper code understanding. To alleviate this difficulty, we propose a new supervised approach based on the novel use of Tree-LSTM to incorporate the sequential and the global semantic features of programs explicitly into the representation model. Unlike previous techniques, our proposed model can not only learn low-level syntactic information within each statement but also the high-level semantic information between statements over the constructed semantic graph. Besides, considering that the sequential semantics is also critical for developers to understand the dependency path and data flow transmission, we propose a DFS-based method to generate the topological order of statements being processed, and then feed them as well as their in-neighboring information and syntactic embeddings into the proposed model to learn richer statement-level semantic features. Extensive experiments on multiple program comprehension tasks, e.g., code clone detection, demonstrate that our method achieves promising performance compared with other existing baselines. 2022-09-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8766 info:doi/10.1016/j.jss.2022.111355 https://ink.library.smu.edu.sg/context/sis_research/article/9769/viewcontent/jss22.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Code representation Graph-LSTM Hierarchical semantics Program classification Clone detection Vulnerability detection Deep learning Databases and Information Systems Graphics and Human Computer Interfaces Software Engineering
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Code representation Graph-LSTM Hierarchical semantics Program classification Clone detection Vulnerability detection Deep learning Databases and Information Systems Graphics and Human Computer Interfaces Software Engineering
spellingShingle	Code representation Graph-LSTM Hierarchical semantics Program classification Clone detection Vulnerability detection Deep learning Databases and Information Systems Graphics and Human Computer Interfaces Software Engineering JIANG, Yuan SU, Xiaohong TREUDE, Christoph WANG, Tiantian Hierarchical semantic-aware neural code representation
description	Code representation is a fundamental problem in many software engineering tasks. Despite the effort made by many researchers, it is still hard for existing methods to fully extract syntactic, structural and sequential features of source code, which form the hierarchical semantics of the program and are necessary to achieve a deeper code understanding. To alleviate this difficulty, we propose a new supervised approach based on the novel use of Tree-LSTM to incorporate the sequential and the global semantic features of programs explicitly into the representation model. Unlike previous techniques, our proposed model can not only learn low-level syntactic information within each statement but also the high-level semantic information between statements over the constructed semantic graph. Besides, considering that the sequential semantics is also critical for developers to understand the dependency path and data flow transmission, we propose a DFS-based method to generate the topological order of statements being processed, and then feed them as well as their in-neighboring information and syntactic embeddings into the proposed model to learn richer statement-level semantic features. Extensive experiments on multiple program comprehension tasks, e.g., code clone detection, demonstrate that our method achieves promising performance compared with other existing baselines.
format	text
author	JIANG, Yuan SU, Xiaohong TREUDE, Christoph WANG, Tiantian
author_facet	JIANG, Yuan SU, Xiaohong TREUDE, Christoph WANG, Tiantian
author_sort	JIANG, Yuan
title	Hierarchical semantic-aware neural code representation
title_short	Hierarchical semantic-aware neural code representation
title_full	Hierarchical semantic-aware neural code representation
title_fullStr	Hierarchical semantic-aware neural code representation
title_full_unstemmed	Hierarchical semantic-aware neural code representation
title_sort	hierarchical semantic-aware neural code representation
publisher	Institutional Knowledge at Singapore Management University
publishDate	2022
url	https://ink.library.smu.edu.sg/sis_research/8766 https://ink.library.smu.edu.sg/context/sis_research/article/9769/viewcontent/jss22.pdf
_version_	1814047522962800640

Hierarchical semantic-aware neural code representation

Similar Items