Deepbindiff: Learning program-wide code representations for binary diffing
Binary diffing analysis quantitatively measures the differences between two given binaries and produces fine-grained basic block matching. It has been widely used to enable different kinds of critical security analysis. However, all existing program analysis and machine learning based techniques suf...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2020
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8168 https://ink.library.smu.edu.sg/context/sis_research/article/9171/viewcontent/10198294.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9171 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-91712023-09-26T10:34:18Z Deepbindiff: Learning program-wide code representations for binary diffing DUAN, Yue LI, Xuezixiang WANG, Jinghan Wang, YIN, Heng Binary diffing analysis quantitatively measures the differences between two given binaries and produces fine-grained basic block matching. It has been widely used to enable different kinds of critical security analysis. However, all existing program analysis and machine learning based techniques suffer from low accuracy, poor scalability, coarse granularity, or require extensive labeled training data to function. In this paper, we propose an unsupervised program-wide code representation learning technique to solve the problem. We rely on both the code semantic information and the program-wide control flow information to generate block embeddings. Furthermore, we propose a k-hop greedy matching algorithm to find the optimal diffing results using the generated block embeddings. We implement a prototype called DeepBinDiff and evaluate its effectiveness and efficiency with large number of binaries. The results show that our tool could outperform the state-of-the-art binary diffing tools by a large margin for both cross-version and cross-optimization level diffing. A case study for OpenSSL using real-world vulnerabilities further demonstrates the usefulness of our system. 2020-02-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8168 info:doi/10.14722/ndss.2020.24311 https://ink.library.smu.edu.sg/context/sis_research/article/9171/viewcontent/10198294.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Information Security |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
Information Security |
spellingShingle |
Information Security DUAN, Yue LI, Xuezixiang WANG, Jinghan Wang, YIN, Heng Deepbindiff: Learning program-wide code representations for binary diffing |
description |
Binary diffing analysis quantitatively measures the differences between two given binaries and produces fine-grained basic block matching. It has been widely used to enable different kinds of critical security analysis. However, all existing program analysis and machine learning based techniques suffer from low accuracy, poor scalability, coarse granularity, or require extensive labeled training data to function. In this paper, we propose an unsupervised program-wide code representation learning technique to solve the problem. We rely on both the code semantic information and the program-wide control flow information to generate block embeddings. Furthermore, we propose a k-hop greedy matching algorithm to find the optimal diffing results using the generated block embeddings. We implement a prototype called DeepBinDiff and evaluate its effectiveness and efficiency with large number of binaries. The results show that our tool could outperform the state-of-the-art binary diffing tools by a large margin for both cross-version and cross-optimization level diffing. A case study for OpenSSL using real-world vulnerabilities further demonstrates the usefulness of our system. |
format |
text |
author |
DUAN, Yue LI, Xuezixiang WANG, Jinghan Wang, YIN, Heng |
author_facet |
DUAN, Yue LI, Xuezixiang WANG, Jinghan Wang, YIN, Heng |
author_sort |
DUAN, Yue |
title |
Deepbindiff: Learning program-wide code representations for binary diffing |
title_short |
Deepbindiff: Learning program-wide code representations for binary diffing |
title_full |
Deepbindiff: Learning program-wide code representations for binary diffing |
title_fullStr |
Deepbindiff: Learning program-wide code representations for binary diffing |
title_full_unstemmed |
Deepbindiff: Learning program-wide code representations for binary diffing |
title_sort |
deepbindiff: learning program-wide code representations for binary diffing |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2020 |
url |
https://ink.library.smu.edu.sg/sis_research/8168 https://ink.library.smu.edu.sg/context/sis_research/article/9171/viewcontent/10198294.pdf |
_version_ |
1779157190079152128 |