BinAlign: Alignment Padding Based Compiler Provenance Recovery

Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is a...

Full description

Saved in:
Bibliographic Details
Main Authors: MALIHA ISMAIL, LIN, Yan, HAN, DongGyun, GAO, Debin
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/8417
https://ink.library.smu.edu.sg/context/sis_research/article/9420/viewcontent/acisp_23__1_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-9420
record_format dspace
spelling sg-smu-ink.sis_research-94202024-01-09T03:32:17Z BinAlign: Alignment Padding Based Compiler Provenance Recovery MALIHA ISMAIL, LIN, Yan HAN, DongGyun GAO, Debin Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is also very challenging to extract useful artifacts from binary when high-level language constructs are missing. Previous works applied machine-learning techniques to predict the source compiler of binaries. However, most of the work is done on the binaries compiled on Linux operating system. We highlight the importance and need to explore Windows compilers and the complicated binaries compiled on the latest versions of these compilers. Therefore, we construct a large dataset of real-world binaries compiled with four major compilers on Windows and four most common optimization settings. The complexity of the optimized programs leads us to identify specific patterns in the binaries that contribute to source compiler and specific optimization level. To address these observations, we propose an improved model based upon the state-of-the-art, and incorporate streamlined alignment padding features in the existing model. Thus, our improved model learns alignment instructions from binary code of portable executables and libraries using the attention mechanism. We conduct an extensive experimentation on a dataset of 296,169 unique and complex binary code generated from C/C++ applications. Our findings demonstrate that our proposed model significantly outperforms the state-of-the-art in accurately predicting the source compiler and optimization flag for complex compiled code. 2023-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8417 info:doi/10.1007/978-3-031-35486-1_26 https://ink.library.smu.edu.sg/context/sis_research/article/9420/viewcontent/acisp_23__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University compiler provenance alignment padding Windows binaries binary code similarity Programming Languages and Compilers
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic compiler provenance
alignment padding
Windows binaries
binary code similarity
Programming Languages and Compilers
spellingShingle compiler provenance
alignment padding
Windows binaries
binary code similarity
Programming Languages and Compilers
MALIHA ISMAIL,
LIN, Yan
HAN, DongGyun
GAO, Debin
BinAlign: Alignment Padding Based Compiler Provenance Recovery
description Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is also very challenging to extract useful artifacts from binary when high-level language constructs are missing. Previous works applied machine-learning techniques to predict the source compiler of binaries. However, most of the work is done on the binaries compiled on Linux operating system. We highlight the importance and need to explore Windows compilers and the complicated binaries compiled on the latest versions of these compilers. Therefore, we construct a large dataset of real-world binaries compiled with four major compilers on Windows and four most common optimization settings. The complexity of the optimized programs leads us to identify specific patterns in the binaries that contribute to source compiler and specific optimization level. To address these observations, we propose an improved model based upon the state-of-the-art, and incorporate streamlined alignment padding features in the existing model. Thus, our improved model learns alignment instructions from binary code of portable executables and libraries using the attention mechanism. We conduct an extensive experimentation on a dataset of 296,169 unique and complex binary code generated from C/C++ applications. Our findings demonstrate that our proposed model significantly outperforms the state-of-the-art in accurately predicting the source compiler and optimization flag for complex compiled code.
format text
author MALIHA ISMAIL,
LIN, Yan
HAN, DongGyun
GAO, Debin
author_facet MALIHA ISMAIL,
LIN, Yan
HAN, DongGyun
GAO, Debin
author_sort MALIHA ISMAIL,
title BinAlign: Alignment Padding Based Compiler Provenance Recovery
title_short BinAlign: Alignment Padding Based Compiler Provenance Recovery
title_full BinAlign: Alignment Padding Based Compiler Provenance Recovery
title_fullStr BinAlign: Alignment Padding Based Compiler Provenance Recovery
title_full_unstemmed BinAlign: Alignment Padding Based Compiler Provenance Recovery
title_sort binalign: alignment padding based compiler provenance recovery
publisher Institutional Knowledge at Singapore Management University
publishDate 2023
url https://ink.library.smu.edu.sg/sis_research/8417
https://ink.library.smu.edu.sg/context/sis_research/article/9420/viewcontent/acisp_23__1_.pdf
_version_ 1787590772020740096