Learning program semantics with code representations: An empirical study
Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-a...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2022
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/7501 https://ink.library.smu.edu.sg/context/sis_research/article/8504/viewcontent/2203.11790.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-8504 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-85042022-11-21T05:28:08Z Learning program semantics with code representations: An empirical study SIOW, Jing Kai LIU, Shangqing XIE, Xiaofei MENG, Guozhu LIU, Yang Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and evaluate its performance on three diverse and popular code intelligent tasks i.e., Code Classification, Vulnerability Detection, and Clone Detection on the public released benchmark. We further design three research questions (RQs) and conduct a comprehensive analysis to investigate the performance. By the extensive experimental results, we conclude that (1) The graph-based representation is superior to the other selected techniques across these tasks. (2) Compared with the node type information used in tree-based and graph-based representations, the node textual information is more critical to learning the program semantics. (3) Different tasks require the task-specific semantics to achieve their highest performance, however combining various program semantics from different dimensions such as control dependency, data dependency can still produce promising results. 2022-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/7501 info:doi/10.1109/SANER53432.2022.00073 https://ink.library.smu.edu.sg/context/sis_research/article/8504/viewcontent/2203.11790.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Programming Languages and Compilers Software Engineering |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Programming Languages and Compilers Software Engineering |
spellingShingle |
Programming Languages and Compilers Software Engineering SIOW, Jing Kai LIU, Shangqing XIE, Xiaofei MENG, Guozhu LIU, Yang Learning program semantics with code representations: An empirical study |
description |
Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and evaluate its performance on three diverse and popular code intelligent tasks i.e., Code Classification, Vulnerability Detection, and Clone Detection on the public released benchmark. We further design three research questions (RQs) and conduct a comprehensive analysis to investigate the performance. By the extensive experimental results, we conclude that (1) The graph-based representation is superior to the other selected techniques across these tasks. (2) Compared with the node type information used in tree-based and graph-based representations, the node textual information is more critical to learning the program semantics. (3) Different tasks require the task-specific semantics to achieve their highest performance, however combining various program semantics from different dimensions such as control dependency, data dependency can still produce promising results. |
format |
text |
author |
SIOW, Jing Kai LIU, Shangqing XIE, Xiaofei MENG, Guozhu LIU, Yang |
author_facet |
SIOW, Jing Kai LIU, Shangqing XIE, Xiaofei MENG, Guozhu LIU, Yang |
author_sort |
SIOW, Jing Kai |
title |
Learning program semantics with code representations: An empirical study |
title_short |
Learning program semantics with code representations: An empirical study |
title_full |
Learning program semantics with code representations: An empirical study |
title_fullStr |
Learning program semantics with code representations: An empirical study |
title_full_unstemmed |
Learning program semantics with code representations: An empirical study |
title_sort |
learning program semantics with code representations: an empirical study |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2022 |
url |
https://ink.library.smu.edu.sg/sis_research/7501 https://ink.library.smu.edu.sg/context/sis_research/article/8504/viewcontent/2203.11790.pdf |
_version_ |
1770576359183089664 |