Graph analysis techniques and applications to bitcoin forensics
Flow based networks have been studied over the years, which have considered different interpretations of a flow, for instance where a flow may be based on transfer (e.g., package delivery), or serial replication (e.g., one-to-one gossip), or parallel duplication (e.g., epidemic spread). However, the...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/106803 http://hdl.handle.net/10220/49668 https://doi.org/10.32657/10220/49668 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Flow based networks have been studied over the years, which have considered different interpretations of a flow, for instance where a flow may be based on transfer (e.g., package delivery), or serial replication (e.g., one-to-one gossip), or parallel duplication (e.g., epidemic spread). However, the study of flows which may split while conserving the overall volume at each split is relatively sparse. In this thesis we look at certain aspects of such split-and-transfer based flows with volume conservation. Specifically, we (i) design general purpose flow based graph analysis techniques and demonstrate their applicability by studying a wide range of real networks, including the network induced by Bitcoin transactions; and (ii) design analysis techniques which are specific to the Bitcoin network, and showcase the efficacy of these techniques in conjunction with our novel flow-based graph analysis algorithms with case studies.
Pertaining the general purpose graph analysis, we first consider the notion of entropic centrality which measures how central a node is in terms of how uncertain the destination of a flow starting at this node is: the more uncertain the destination (i.e., larger the spread of the flow), the more well connected and thus central the node is deemed. Prior works on entropic centrality implicitly assumed that the flow is indivisible, and at every node, the flow is transferred from one edge to another. In this thesis, we propose a split-and-transfer flow model for entropic centrality, where at every node, the flow can actually be arbitrarily split across choices of neighbours, while conserving the overall volume. This interpretation of volume conserved non-atomic flows better reflects certain applications, including Bitcoin transactions. We consider two variations. In one, we constrain that a down-stream flow cannot revisit a node that has already been visited upstream (no cycles), and show how to relate this model to an equivalent transfer based entropic centrality one for computational tractability. We then relax the `no cycle' constraint, and explore a Markov process of random walk which simplifies the computation. These entropic centrality models can easily accommodate directed and weighted graphs. Based on the latter variant of (Markov) entropic centrality model, we next design a 2-stage clustering algorithm, which in the first stage is informed by the entropic centrality to detect local communities based on random walks, and then performs a bottom-up iterative aggregation.
We furthermore explore graph clustering by using an entropic distance based approach as an alternative to distribution of flow based approach as described above. To that end, we revisit a Renyi entropy based measure introduced originally for image clustering, and study its application to graph clustering. To effectuate Renyi entropy based graph clustering, we propose a simulated annealing algorithm, as well as a hierarchical approach. While the simulated annealing approach is practical for a range of graphs, our early results indicate that it (or at least our realization) does not scale for large graphs. This was an immediate motivation for the exploration of a hierarchical agglomerative approach which is naturally amenable to (not explored in this thesis) distribution and parallelization.
Finally, we demonstrate the use of the above novel information theory based algorithms to derive macroscopic information from the Bitcoin network. We augment these macroscopic general purpose graph algorithms with Bitcoin specific graph analysis algorithms that can be used to carry out microscopic analysis of a zoomed in Bitcoin subnetwork. Specifically, we study path confluence based address aggregation and Bitcoin flow traceback/forward mechanisms, which are centred around Bitcoin addresses of potential interest. In addition to that, we designed and developed a visualization aided graph mining tool, BiVA, which enables Bitcoin network data exploration, visualization of subgraphs around nodes of interest, and integrates both standard and new algorithms, including the aforementioned general algorithms for flow based centrality and clustering, and the Bitcoin microscopic graph analysis algorithms. |
---|