Cross-language learning for program classification using bilateral tree-based convolutional neural networks

Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolution...

Full description

Saved in:
Bibliographic Details
Main Authors: BUI, Duy Quoc Nghi, JIANG, Lingxiao, YU, Yijun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4307
https://ink.library.smu.edu.sg/context/sis_research/article/5310/viewcontent/17338_76045_1_PB.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-5310
record_format dspace
spelling sg-smu-ink.sis_research-53102019-02-21T08:30:05Z Cross-language learning for program classification using bilateral tree-based convolutional neural networks BUI, Duy Quoc Nghi JIANG, Lingxiao YU, Yijun Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement the same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision. 2018-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4307 https://ink.library.smu.edu.sg/context/sis_research/article/5310/viewcontent/17338_76045_1_PB.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Software Engineering Theory and Algorithms
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Software Engineering
Theory and Algorithms
spellingShingle Software Engineering
Theory and Algorithms
BUI, Duy Quoc Nghi
JIANG, Lingxiao
YU, Yijun
Cross-language learning for program classification using bilateral tree-based convolutional neural networks
description Towards the vision of translating code that implements an algorithm from one programming language into another, this paper proposes an approach for automated program classification using bilateral tree-based convolutional neural networks (BiTBCNNs). It is layered on top of two tree-based convolutional neural networks (TBCNNs), each of which recognizes the algorithm of code written in an individual programming language. The combination layer of the networks recognizes the similarities and differences among code in different programming languages. The BiTBCNNs are trained using the source code in different languages but known to implement the same algorithms and/or functionalities. For a preliminary evaluation, we use 3591 Java and 3534 C++ code snippets from 6 algorithms we crawled systematically from GitHub. We obtained over 90% accuracy in the cross-language binary classification task to tell whether any given two code snippets implement the same algorithm. Also, for the algorithm classification task, i.e., to predict which one of the six algorithm labels is implemented by an arbitrary C++ code snippet, we achieved over 80% precision.
format text
author BUI, Duy Quoc Nghi
JIANG, Lingxiao
YU, Yijun
author_facet BUI, Duy Quoc Nghi
JIANG, Lingxiao
YU, Yijun
author_sort BUI, Duy Quoc Nghi
title Cross-language learning for program classification using bilateral tree-based convolutional neural networks
title_short Cross-language learning for program classification using bilateral tree-based convolutional neural networks
title_full Cross-language learning for program classification using bilateral tree-based convolutional neural networks
title_fullStr Cross-language learning for program classification using bilateral tree-based convolutional neural networks
title_full_unstemmed Cross-language learning for program classification using bilateral tree-based convolutional neural networks
title_sort cross-language learning for program classification using bilateral tree-based convolutional neural networks
publisher Institutional Knowledge at Singapore Management University
publishDate 2018
url https://ink.library.smu.edu.sg/sis_research/4307
https://ink.library.smu.edu.sg/context/sis_research/article/5310/viewcontent/17338_76045_1_PB.pdf
_version_ 1770574605109428224