Checking smart contracts with structural code embedding

Smart contracts have been increasingly used together with blockchains to automate financial and business transactions. However, many bugs and vulnerabilities have been identified in many contracts which raises serious concerns about smart contract security, not to mention that the blockchain systems...

Full description

Saved in:
Bibliographic Details
Main Authors: GAO, Zhipeng, JIANG, Lingxiao, XIA, Xin, LO, David, GRUNDY, John
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5606
https://ink.library.smu.edu.sg/context/sis_research/article/6609/viewcontent/TSE20SmartEmbed.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6609
record_format dspace
spelling sg-smu-ink.sis_research-66092022-08-10T01:43:59Z Checking smart contracts with structural code embedding GAO, Zhipeng JIANG, Lingxiao XIA, Xin LO, David GRUNDY, John Smart contracts have been increasingly used together with blockchains to automate financial and business transactions. However, many bugs and vulnerabilities have been identified in many contracts which raises serious concerns about smart contract security, not to mention that the blockchain systems on which the smart contracts are built can be buggy. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this paper, we propose an automated approach to learn characteristics of smart contracts in Solidity, useful for repetitive contract code, bug detection and contract validation. Our new approach is based on word embeddings and vector space comparison. We parse smart contract code into word streams with code structural information, convert code elements (e.g., statements, functions) into numerical vectors that are supposed to encode the code syntax and semantics, and compare the similarities among the vectors encoding code and known bugs, to identify potential issues. We have implemented the approach in a prototype, named SmartEmbed, and evaluated it with more than 22,000 smart contracts collected from the Ethereum blockchain. Results show that our tool can effectively identify many repetitive instances of Solidity code, where the clone ratio is around 90%. Code clones such as type-III or even type-IV semantic clones can also be detected. Our tool can identify more than 500 clone related bugs based on our bug databases efficiently and accurately. Our tool can also help to efficiently validate any given smart contract against the known set of bugs, which can help to improve the users' confidence in the reliability of the contract. 2021-12-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5606 info:doi/10.1109/TSE.2020.2971482 https://ink.library.smu.edu.sg/context/sis_research/article/6609/viewcontent/TSE20SmartEmbed.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University smart contract code embedding clone detection bug detection ethereum blockchain Software Engineering
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic smart contract
code embedding
clone detection
bug detection
ethereum
blockchain
Software Engineering
spellingShingle smart contract
code embedding
clone detection
bug detection
ethereum
blockchain
Software Engineering
GAO, Zhipeng
JIANG, Lingxiao
XIA, Xin
LO, David
GRUNDY, John
Checking smart contracts with structural code embedding
description Smart contracts have been increasingly used together with blockchains to automate financial and business transactions. However, many bugs and vulnerabilities have been identified in many contracts which raises serious concerns about smart contract security, not to mention that the blockchain systems on which the smart contracts are built can be buggy. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this paper, we propose an automated approach to learn characteristics of smart contracts in Solidity, useful for repetitive contract code, bug detection and contract validation. Our new approach is based on word embeddings and vector space comparison. We parse smart contract code into word streams with code structural information, convert code elements (e.g., statements, functions) into numerical vectors that are supposed to encode the code syntax and semantics, and compare the similarities among the vectors encoding code and known bugs, to identify potential issues. We have implemented the approach in a prototype, named SmartEmbed, and evaluated it with more than 22,000 smart contracts collected from the Ethereum blockchain. Results show that our tool can effectively identify many repetitive instances of Solidity code, where the clone ratio is around 90%. Code clones such as type-III or even type-IV semantic clones can also be detected. Our tool can identify more than 500 clone related bugs based on our bug databases efficiently and accurately. Our tool can also help to efficiently validate any given smart contract against the known set of bugs, which can help to improve the users' confidence in the reliability of the contract.
format text
author GAO, Zhipeng
JIANG, Lingxiao
XIA, Xin
LO, David
GRUNDY, John
author_facet GAO, Zhipeng
JIANG, Lingxiao
XIA, Xin
LO, David
GRUNDY, John
author_sort GAO, Zhipeng
title Checking smart contracts with structural code embedding
title_short Checking smart contracts with structural code embedding
title_full Checking smart contracts with structural code embedding
title_fullStr Checking smart contracts with structural code embedding
title_full_unstemmed Checking smart contracts with structural code embedding
title_sort checking smart contracts with structural code embedding
publisher Institutional Knowledge at Singapore Management University
publishDate 2021
url https://ink.library.smu.edu.sg/sis_research/5606
https://ink.library.smu.edu.sg/context/sis_research/article/6609/viewcontent/TSE20SmartEmbed.pdf
_version_ 1770575528710897664