TreeCaps: Tree-based capsule networks for source code processing

Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). While graphs may be better at capturing various viewpoints of code semantics than trees, constructing gr...

Full description

Saved in:
Bibliographic Details
Main Authors: BUI, Duy Quoc Nghi, YU, Yijun, JIANG, Lingxiao
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/6701
https://ink.library.smu.edu.sg/context/sis_research/article/7704/viewcontent/aaai21treecaps_preprint.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-7704
record_format dspace
spelling sg-smu-ink.sis_research-77042022-04-21T04:53:21Z TreeCaps: Tree-based capsule networks for source code processing BUI, Duy Quoc Nghi YU, Yijun JIANG, Lingxiao Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). While graphs may be better at capturing various viewpoints of code semantics than trees, constructing graph inputs from code need static code semantic analysis that may not be accurate and introduces noise during learning. On the other hand, syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs. We propose a new tree-based learning technique, named TreeCaps, by fusing capsule networks with tree-based convolutional neural networks, to achieve learning accuracy higher than existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variableto-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction. The implementation of TreeCaps is publicly available at https://github.com/bdqnghi/treecaps. 2021-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/6701 https://ink.library.smu.edu.sg/context/sis_research/article/7704/viewcontent/aaai21treecaps_preprint.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University OS and Networks Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic OS and Networks
Software Engineering
spellingShingle OS and Networks
Software Engineering
BUI, Duy Quoc Nghi
YU, Yijun
JIANG, Lingxiao
TreeCaps: Tree-based capsule networks for source code processing
description Recently program learning techniques have been proposed to process source code based on syntactical structures (e.g., Abstract Syntax Trees) and/or semantic information (e.g., Dependency Graphs). While graphs may be better at capturing various viewpoints of code semantics than trees, constructing graph inputs from code need static code semantic analysis that may not be accurate and introduces noise during learning. On the other hand, syntax trees are precisely defined according to the language grammar and easier to construct and process than graphs. We propose a new tree-based learning technique, named TreeCaps, by fusing capsule networks with tree-based convolutional neural networks, to achieve learning accuracy higher than existing graph-based techniques while it is based only on trees. TreeCaps introduces novel variableto-static routing algorithms into the capsule networks to compensate for the loss of previous routing algorithms. Aside from accuracy, we also find that TreeCaps is the most robust to withstand those semantic-preserving program transformations that change code syntax without modifying the semantics. Evaluated on a large number of Java and C/C++ programs, TreeCaps models outperform prior deep learning models of program source code, in terms of both accuracy and robustness for program comprehension tasks such as code functionality classification and function name prediction. The implementation of TreeCaps is publicly available at https://github.com/bdqnghi/treecaps.
format text
author BUI, Duy Quoc Nghi
YU, Yijun
JIANG, Lingxiao
author_facet BUI, Duy Quoc Nghi
YU, Yijun
JIANG, Lingxiao
author_sort BUI, Duy Quoc Nghi
title TreeCaps: Tree-based capsule networks for source code processing
title_short TreeCaps: Tree-based capsule networks for source code processing
title_full TreeCaps: Tree-based capsule networks for source code processing
title_fullStr TreeCaps: Tree-based capsule networks for source code processing
title_full_unstemmed TreeCaps: Tree-based capsule networks for source code processing
title_sort treecaps: tree-based capsule networks for source code processing
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/6701
https://ink.library.smu.edu.sg/context/sis_research/article/7704/viewcontent/aaai21treecaps_preprint.pdf
_version_ 1770576050210734080