Automatic program analysis and verification and their applications in smart contracts
The success of Bitcoin since 2009 stimulates the development of other blockchain-based applications, such as Ethereum, a second generation of cryptocurrency which supports the revolutionary idea of smart contracts. An Ethereum smart contract is a computer program written in some domain-specific high...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/147268 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The success of Bitcoin since 2009 stimulates the development of other blockchain-based applications, such as Ethereum, a second generation of cryptocurrency which supports the revolutionary idea of smart contracts. An Ethereum smart contract is a computer program written in some domain-specific high-level programming languages, such as Solidity, Vyper, Bamboo, Flint, etc., and stored on the blockchain to achieve certain functionality. Smart contracts benefit from the features of the blockchain in various aspects. For instance, it is not necessary to have an external trusted authority to achieve consensus, and transactions through smart contracts are always traceable and credible.
Smart contracts must be verified for multiple reasons. Firstly, due to the decentralized nature of the blockchain, smart contracts are different from programs written in other programming languages (e.g., C/Java). For instance, the storage of each contract instance is located at a permanent address on the blockchain. In this way, each contract instance is a particular execution context and context switches are possible through external calls. Particularly, in Solidity, delegatecall is executed in the context of the caller rather than the recipient, making it possible to modify the state of the caller. Programmers must be aware of the execution context of each statement to guarantee the programming correctness. Therefore, programming smart contracts is error-prone without a proper understanding of the underlying semantic model. Secondly, a smart contract can be deployed on the blockchain by any user in the network. Vulnerabilities in deployed contracts can be exploited to launch attacks that lead to huge financial loss. Verifying smart contracts against such vulnerabilities is crucial for protecting digital assets. One famous attack on smart contracts is the DAO attack in which the attacker exploited the reentrancy vulnerability and managed to take 60 million dollars under his/her control. Finally, it is very difficult, if not impossible, to patch a smart contract once it is deployed due to the very nature of the blockchain.
There is a surge of interest in analyzing and verifying smart contracts. To the best of our knowledge, most of the existing approaches either focus on EVM (Ethereum Virtual Machine) bytecode, or translate Solidity smart contracts into programs in intermediate languages that are suitable for verifying smart contracts or detecting potential issues in associated verifiers or checkers. Furthermore, none of the existing works can directly handle smart contracts written in different high-level programming languages without translating them into EVM bytecode or intermediate languages. None of the existing approaches defines security properties with the high-level semantics of smart contracts.
A direct executable formal semantics of the high-level smart contract programming language concerned is a must for both understanding and verifying smart contracts. Firstly, programmers write and reason about smart contracts at the level of source code without the semantics of which they are required to understand how Solidity programs are compiled into EVM bytecode in order to understand these contracts, which is far from trivial. In addition, there may be semantic gaps between high-level smart contract programming languages and low-level bytecode which are introduced by compiler bugs. In other words, after compilation the semantics of bytecode may not be equivalent to that of the corresponding high-level programs due to the compiler bugs involved. Therefore, both high-level and low-level semantics definitions are necessary to conduct equivalence checking to guarantee that security properties are preserved at both levels and reason about compiler bugs. Secondly, even though smart contracts can be transformed into programs in intermediate languages to be analyzed and verified in existing model checkers and verifiers, the equivalence checking of the high-level smart contract programming language concerned and the intermediate language applied to interpret it is crucial to the validity of the verification.
We develop an executable operational semantics for the Solidity programming language to formally reason about smart contracts written in Solidity. The contributions of this work lie in four aspects. Firstly, our work is the first approach, to the best of our knowledge, to a complete executable formal semantics of Solidity constructed directly on the language itself other than Solidity compilers. The proposed executable semantics completely covers the supported high-level core features specified by the official Solidity documentation and is validated with the official compiler Remix. In addition, a new and general way of semantics formalization is applied in the semantics design, making the proposed semantics robust in the language evolution of smart contracts. Secondly, the proposed semantics provides a formal specification of smart contracts which solves the specification issues in the existing verification and analysis tools. Thirdly, the proposed semantics allows us to formally define semantic-level security properties for verifying smart contracts to exclude the false positives and negatives introduced by the existing approaches. Finally, the proposed semantics defines correct and secure high-level execution behaviours of smart contracts to reason about compiler bugs and assist developers in writing secure smart contracts.
Furthermore, we develop a generalized formal semantic framework for smart contracts. The contributions of this work lie in three aspects. Firstly, our work is the first approach, to the best of our knowledge, to a generalized formal semantic framework for smart contracts which can directly handle contracts written in different high-level programming languages. Secondly, a general semantic model of smart contracts is constructed with rewriting logic in the K-framework. With the general semantic model, a direct executable formal semantics of a particular high-level smart contract programming language can be constructed as long as its core features fall into the ones defined in this model. The general semantic model is validated with its interpretation in Solidity using the Solidity compiler test set and evaluation results show that it is complete and correct. Lastly, the generated semantics facilitates the formal verification of smart contracts written in a particular high-level programming language as a formal specification of the corresponding language. Together with low-level specifications, it allows us to conduct equivalence checking on high-level contracts and low-level bytecode to reason about compiler bugs and guarantee that security properties are preserved at both levels.
We define some security properties based on the formal semantics of the high-level smart contract programming languages. These security properties are defined from a general point of view and can be applied to detect a variety of vulnerabilities in smart contracts, such as the reentrancy vulnerability, exception disorders, integer underflows and overflows, etc. In addition, verification algorithms for these properties are constructed with the proposed semantic framework. We compare our approaches to the security analysis of smart contracts with the existing ones and highlight the importance of precise semantic foundations for the verification of smart contracts.
Lastly, we systematize the existing approaches to the security analysis of smart contracts from the perspectives of domain-specific research and applications of general ideas to study the connection between the domain-specific issues in smart contracts and general analysis techniques.
To conclude, this thesis studies the formal specification and verification of smart contracts, and combines domain-specific analysis and applications of general techniques. It also motivates other research directions related to the security analysis of smart contracts, such as the equivalence checking of high-level contracts and low-level bytecode, formal reasoning of high-level smart contract programming languages, the fairness analysis of smart contracts, etc. |
---|