BinAlign: Alignment Padding Based Compiler Provenance Recovery
Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is a...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2023
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/8417 https://ink.library.smu.edu.sg/context/sis_research/article/9420/viewcontent/acisp_23__1_.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
id |
sg-smu-ink.sis_research-9420 |
---|---|
record_format |
dspace |
spelling |
sg-smu-ink.sis_research-94202024-01-09T03:32:17Z BinAlign: Alignment Padding Based Compiler Provenance Recovery MALIHA ISMAIL, LIN, Yan HAN, DongGyun GAO, Debin Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is also very challenging to extract useful artifacts from binary when high-level language constructs are missing. Previous works applied machine-learning techniques to predict the source compiler of binaries. However, most of the work is done on the binaries compiled on Linux operating system. We highlight the importance and need to explore Windows compilers and the complicated binaries compiled on the latest versions of these compilers. Therefore, we construct a large dataset of real-world binaries compiled with four major compilers on Windows and four most common optimization settings. The complexity of the optimized programs leads us to identify specific patterns in the binaries that contribute to source compiler and specific optimization level. To address these observations, we propose an improved model based upon the state-of-the-art, and incorporate streamlined alignment padding features in the existing model. Thus, our improved model learns alignment instructions from binary code of portable executables and libraries using the attention mechanism. We conduct an extensive experimentation on a dataset of 296,169 unique and complex binary code generated from C/C++ applications. Our findings demonstrate that our proposed model significantly outperforms the state-of-the-art in accurately predicting the source compiler and optimization flag for complex compiled code. 2023-07-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/8417 info:doi/10.1007/978-3-031-35486-1_26 https://ink.library.smu.edu.sg/context/sis_research/article/9420/viewcontent/acisp_23__1_.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University compiler provenance alignment padding Windows binaries binary code similarity Programming Languages and Compilers |
institution |
Singapore Management University |
building |
SMU Libraries |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
SMU Libraries |
collection |
InK@SMU |
language |
English |
topic |
compiler provenance alignment padding Windows binaries binary code similarity Programming Languages and Compilers |
spellingShingle |
compiler provenance alignment padding Windows binaries binary code similarity Programming Languages and Compilers MALIHA ISMAIL, LIN, Yan HAN, DongGyun GAO, Debin BinAlign: Alignment Padding Based Compiler Provenance Recovery |
description |
Compiler provenance is significant in investigating the source-level indicators of binary code, like development-environment, source compiler, and optimization settings. Not only does compiler provenance analysis have important security applications in malware and vulnerability analysis, but it is also very challenging to extract useful artifacts from binary when high-level language constructs are missing. Previous works applied machine-learning techniques to predict the source compiler of binaries. However, most of the work is done on the binaries compiled on Linux operating system. We highlight the importance and need to explore Windows compilers and the complicated binaries compiled on the latest versions of these compilers. Therefore, we construct a large dataset of real-world binaries compiled with four major compilers on Windows and four most common optimization settings. The complexity of the optimized programs leads us to identify specific patterns in the binaries that contribute to source compiler and specific optimization level. To address these observations, we propose an improved model based upon the state-of-the-art, and incorporate streamlined alignment padding features in the existing model. Thus, our improved model learns alignment instructions from binary code of portable executables and libraries using the attention mechanism. We conduct an extensive experimentation on a dataset of 296,169 unique and complex binary code generated from C/C++ applications. Our findings demonstrate that our proposed model significantly outperforms the state-of-the-art in accurately predicting the source compiler and optimization flag for complex compiled code. |
format |
text |
author |
MALIHA ISMAIL, LIN, Yan HAN, DongGyun GAO, Debin |
author_facet |
MALIHA ISMAIL, LIN, Yan HAN, DongGyun GAO, Debin |
author_sort |
MALIHA ISMAIL, |
title |
BinAlign: Alignment Padding Based Compiler Provenance Recovery |
title_short |
BinAlign: Alignment Padding Based Compiler Provenance Recovery |
title_full |
BinAlign: Alignment Padding Based Compiler Provenance Recovery |
title_fullStr |
BinAlign: Alignment Padding Based Compiler Provenance Recovery |
title_full_unstemmed |
BinAlign: Alignment Padding Based Compiler Provenance Recovery |
title_sort |
binalign: alignment padding based compiler provenance recovery |
publisher |
Institutional Knowledge at Singapore Management University |
publishDate |
2023 |
url |
https://ink.library.smu.edu.sg/sis_research/8417 https://ink.library.smu.edu.sg/context/sis_research/article/9420/viewcontent/acisp_23__1_.pdf |
_version_ |
1787590772020740096 |