Bilateral dependency neural networks for cross-language algorithm classification

Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of...

Full description

Saved in:

Bibliographic Details
Main Authors:	BUI, Duy Quoc Nghi, YU, Yijun, JIANG, Lingxiao
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2019
Subjects:	cross-language mapping program classification algorithm classification code embedding code dependency neural network bilateral neural network Software Engineering Theory and Algorithms
Online Access:	https://ink.library.smu.edu.sg/sis_research/4367 https://ink.library.smu.edu.sg/context/sis_research/article/5370/viewcontent/saner19dtbcnn.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-5370
record_format	dspace
spelling	sg-smu-ink.sis_research-53702019-06-13T09:58:39Z Bilateral dependency neural networks for cross-language algorithm classification BUI, Duy Quoc Nghi YU, Yijun JIANG, Lingxiao Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language.To recognise the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognise algorithm classes across languages.We have instantiated the framework with several kinds of token-, tree- and graph-based neural networks that encode and learn various kinds of information in code. We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures.Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated. Our study points to a possible future research direction to tailor bilateral and multilateral neural networks that encode more relevant semantics for code learning, mining and analysis tasks. 2019-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/4367 info:doi/10.1109/SANER.2019.8667995 https://ink.library.smu.edu.sg/context/sis_research/article/5370/viewcontent/saner19dtbcnn.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University cross-language mapping program classification algorithm classification code embedding code dependency neural network bilateral neural network Software Engineering Theory and Algorithms
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	cross-language mapping program classification algorithm classification code embedding code dependency neural network bilateral neural network Software Engineering Theory and Algorithms
spellingShingle	cross-language mapping program classification algorithm classification code embedding code dependency neural network bilateral neural network Software Engineering Theory and Algorithms BUI, Duy Quoc Nghi YU, Yijun JIANG, Lingxiao Bilateral dependency neural networks for cross-language algorithm classification
description	Algorithm classification is to automatically identify the classes of a program based on the algorithm(s) and/or data structure(s) implemented in the program. It can be useful for various tasks, such as code reuse, code theft detection, and malware detection. Code similarity metrics, on the basis of features extracted from syntax and semantics, have been used to classify programs. Such features, however, often need manual selection effort and are specific to individual programming languages, limiting the classifiers to programs in the same language.To recognise the similarities and differences among algorithms implemented in different languages, this paper describes a framework of Bilateral Neural Networks (Bi-NN) that builds a neural network on top of two underlying sub-networks, each of which encodes syntax and semantics of code in one language. A whole Bi-NN can be trained with bilateral programs that implement the same algorithms and/or data structures in different languages and then be applied to recognise algorithm classes across languages.We have instantiated the framework with several kinds of token-, tree- and graph-based neural networks that encode and learn various kinds of information in code. We have applied the instances of the framework to a code corpus collected from GitHub containing thousands of Java and C++ programs implementing 50 different algorithms and data structures.Our evaluation results show that the use of Bi-NN indeed produces promising algorithm classification results both within one language and across languages, and the encoding of dependencies from code into the underlying neural networks helps improve algorithm classification accuracy further. In particular, our custom-built dependency trees with tree-based convolutional neural networks achieve the highest classification accuracy among the different instances of the framework that we have evaluated. Our study points to a possible future research direction to tailor bilateral and multilateral neural networks that encode more relevant semantics for code learning, mining and analysis tasks.
format	text
author	BUI, Duy Quoc Nghi YU, Yijun JIANG, Lingxiao
author_facet	BUI, Duy Quoc Nghi YU, Yijun JIANG, Lingxiao
author_sort	BUI, Duy Quoc Nghi
title	Bilateral dependency neural networks for cross-language algorithm classification
title_short	Bilateral dependency neural networks for cross-language algorithm classification
title_full	Bilateral dependency neural networks for cross-language algorithm classification
title_fullStr	Bilateral dependency neural networks for cross-language algorithm classification
title_full_unstemmed	Bilateral dependency neural networks for cross-language algorithm classification
title_sort	bilateral dependency neural networks for cross-language algorithm classification
publisher	Institutional Knowledge at Singapore Management University
publishDate	2019
url	https://ink.library.smu.edu.sg/sis_research/4367 https://ink.library.smu.edu.sg/context/sis_research/article/5370/viewcontent/saner19dtbcnn.pdf
_version_	1770574688114704384

Bilateral dependency neural networks for cross-language algorithm classification

Similar Items