SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing

Pseudocode diffing precisely locates similar parts and captures differences between the decompiled pseudocode of two given binaries. It is particularly useful in many security scenarios such as code plagiarism detection, lineage analysis, patch, vulnerability analysis, etc. However, existing pseudoc...

Full description

Saved in:

Bibliographic Details
Main Authors:	GAO, Lian, QU, Yu, YU, Sheng, DUAN, Yue, YIN, Heng
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2024
Subjects:	Information Security OS and Networks
Online Access:	https://ink.library.smu.edu.sg/sis_research/8668 https://ink.library.smu.edu.sg/context/sis_research/article/9671/viewcontent/2024_208_paper__1_.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

id	sg-smu-ink.sis_research-9671
record_format	dspace
spelling	sg-smu-ink.sis_research-96712024-03-07T07:42:05Z SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing GAO, Lian QU, Yu YU, Sheng DUAN, Yue YIN, Heng Pseudocode diffing precisely locates similar parts and captures differences between the decompiled pseudocode of two given binaries. It is particularly useful in many security scenarios such as code plagiarism detection, lineage analysis, patch, vulnerability analysis, etc. However, existing pseudocode diffing and binary diffing tools suffer from low accuracy and poor scalability, since they either rely on manually-designed heuristics (e.g., Diaphora) or heavy computations like matrix factorization (e.g., DeepBinDiff). To address the limitations, in this paper, we propose a semantics-aware, deep neural network-based model called SIGMADIFF. SIGMADIFF first constructs IR (Intermediate Representation) level interprocedural program dependency graphs (IPDGs). Then it uses a lightweight symbolic analysis to extract initial node features and locate training nodes for the neural network model. SIGMADIFF then leverages the stateof-the-art graph matching model called Deep Graph Matching Consensus (DGMC) to match the nodes in IPDGs. SIGMADIFF also introduces several important updates to the design of DGMC such as the pre-training and fine-tuning schema. Experimental results show that SIGMADIFF significantly outperforms the stateof-the-art heuristic-based and deep learning-based techniques in terms of both accuracy and efficiency. It is able to precisely pinpoint eight vulnerabilities in a widely-used video conferencing application. 2024-03-01T08:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8668 info:doi/10.14722/ndss.2024.23208 https://ink.library.smu.edu.sg/context/sis_research/article/9671/viewcontent/2024_208_paper__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Information Security OS and Networks
institution	Singapore Management University
building	SMU Libraries
continent	Asia
country	Singapore Singapore
content_provider	SMU Libraries
collection	InK@SMU
language	English
topic	Information Security OS and Networks
spellingShingle	Information Security OS and Networks GAO, Lian QU, Yu YU, Sheng DUAN, Yue YIN, Heng SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing
description	Pseudocode diffing precisely locates similar parts and captures differences between the decompiled pseudocode of two given binaries. It is particularly useful in many security scenarios such as code plagiarism detection, lineage analysis, patch, vulnerability analysis, etc. However, existing pseudocode diffing and binary diffing tools suffer from low accuracy and poor scalability, since they either rely on manually-designed heuristics (e.g., Diaphora) or heavy computations like matrix factorization (e.g., DeepBinDiff). To address the limitations, in this paper, we propose a semantics-aware, deep neural network-based model called SIGMADIFF. SIGMADIFF first constructs IR (Intermediate Representation) level interprocedural program dependency graphs (IPDGs). Then it uses a lightweight symbolic analysis to extract initial node features and locate training nodes for the neural network model. SIGMADIFF then leverages the stateof-the-art graph matching model called Deep Graph Matching Consensus (DGMC) to match the nodes in IPDGs. SIGMADIFF also introduces several important updates to the design of DGMC such as the pre-training and fine-tuning schema. Experimental results show that SIGMADIFF significantly outperforms the stateof-the-art heuristic-based and deep learning-based techniques in terms of both accuracy and efficiency. It is able to precisely pinpoint eight vulnerabilities in a widely-used video conferencing application.
format	text
author	GAO, Lian QU, Yu YU, Sheng DUAN, Yue YIN, Heng
author_facet	GAO, Lian QU, Yu YU, Sheng DUAN, Yue YIN, Heng
author_sort	GAO, Lian
title	SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing
title_short	SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing
title_full	SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing
title_fullStr	SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing
title_full_unstemmed	SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing
title_sort	sigmadiff: semantics-aware deep graph matching for pseudocode diffing
publisher	Institutional Knowledge at Singapore Management University
publishDate	2024
url	https://ink.library.smu.edu.sg/sis_research/8668 https://ink.library.smu.edu.sg/context/sis_research/article/9671/viewcontent/2024_208_paper__1_.pdf
_version_	1794549750697033728

SigmaDiff: Semantics-aware deep graph matching for pseudocode diffing

Similar Items