Accurate and scalable cross-architecture cross-OS binary code search with emulation

Different from source code clone detection, clone detection (similar code search) in binary executables faces big challenges due to the gigantic differences in the syntax and the structure of binary code that result from different configurations of compilers, architectures and OSs. Existing studies...

Full description

Saved in:
Bibliographic Details
Main Authors: Xue, Yinxing, Xu, Zhengzi, Chandramohan, Mahinthan, Liu, Yang
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/141413
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-141413
record_format dspace
spelling sg-ntu-dr.10356-1414132020-06-08T06:22:29Z Accurate and scalable cross-architecture cross-OS binary code search with emulation Xue, Yinxing Xu, Zhengzi Chandramohan, Mahinthan Liu, Yang School of Computer Science and Engineering Engineering::Computer science and engineering Binary Code Search Binary Clone Detection Different from source code clone detection, clone detection (similar code search) in binary executables faces big challenges due to the gigantic differences in the syntax and the structure of binary code that result from different configurations of compilers, architectures and OSs. Existing studies have proposed different categories of features for detecting binary code clones, including CFG structures, n-gram in CFG, input/output values, etc. In our previous study and the tool BinGo, to mitigate the huge gaps in CFG structures due to different compilation scenarios, we propose a selective inlining technique to capture the complete function semantics by inlining relevant library and user-defined functions. However, only features of input/output values are considered in BinGo. In this study, we propose to incorporate features from different categories (e.g., structural features and high-level semantic features) for accuracy improvement and emulation for efficiency improvement. We empirically compare our tool, BinGo-E, with the pervious tool BinGo and the available state-of-the-art tools of binary code search in terms of search accuracy and performance. Results show that BinGo-E achieves significantly better accuracies than BinGo for cross-architecture matching, cross-OS matching, cross-compiler matching and intra-compiler matching. Additionally, in the new task of matching binaries of forked projects, BinGo-E also exhibits a better accuracy than the existing benchmark tool. Meanwhile, BinGo-E takes less time than BinGo during the process of matching. NRF (Natl Research Foundation, S’pore) 2020-06-08T06:22:29Z 2020-06-08T06:22:29Z 2019 Journal Article Xue, Y., Xu, Z., Chandramohan, M., & Liu, Y. (2018). Accurate and scalable cross-architecture cross-OS binary code search with emulation. IEEE Transactions on Software Engineering, 45(11), 1125 - 1149. doi:10.1109/TSE.2018.2827379 0098-5589 https://hdl.handle.net/10356/141413 10.1109/TSE.2018.2827379 2-s2.0-85045643261 11 45 1125 1149 en IEEE Transactions on Software Engineering © 2018 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Computer science and engineering
Binary Code Search
Binary Clone Detection
spellingShingle Engineering::Computer science and engineering
Binary Code Search
Binary Clone Detection
Xue, Yinxing
Xu, Zhengzi
Chandramohan, Mahinthan
Liu, Yang
Accurate and scalable cross-architecture cross-OS binary code search with emulation
description Different from source code clone detection, clone detection (similar code search) in binary executables faces big challenges due to the gigantic differences in the syntax and the structure of binary code that result from different configurations of compilers, architectures and OSs. Existing studies have proposed different categories of features for detecting binary code clones, including CFG structures, n-gram in CFG, input/output values, etc. In our previous study and the tool BinGo, to mitigate the huge gaps in CFG structures due to different compilation scenarios, we propose a selective inlining technique to capture the complete function semantics by inlining relevant library and user-defined functions. However, only features of input/output values are considered in BinGo. In this study, we propose to incorporate features from different categories (e.g., structural features and high-level semantic features) for accuracy improvement and emulation for efficiency improvement. We empirically compare our tool, BinGo-E, with the pervious tool BinGo and the available state-of-the-art tools of binary code search in terms of search accuracy and performance. Results show that BinGo-E achieves significantly better accuracies than BinGo for cross-architecture matching, cross-OS matching, cross-compiler matching and intra-compiler matching. Additionally, in the new task of matching binaries of forked projects, BinGo-E also exhibits a better accuracy than the existing benchmark tool. Meanwhile, BinGo-E takes less time than BinGo during the process of matching.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Xue, Yinxing
Xu, Zhengzi
Chandramohan, Mahinthan
Liu, Yang
format Article
author Xue, Yinxing
Xu, Zhengzi
Chandramohan, Mahinthan
Liu, Yang
author_sort Xue, Yinxing
title Accurate and scalable cross-architecture cross-OS binary code search with emulation
title_short Accurate and scalable cross-architecture cross-OS binary code search with emulation
title_full Accurate and scalable cross-architecture cross-OS binary code search with emulation
title_fullStr Accurate and scalable cross-architecture cross-OS binary code search with emulation
title_full_unstemmed Accurate and scalable cross-architecture cross-OS binary code search with emulation
title_sort accurate and scalable cross-architecture cross-os binary code search with emulation
publishDate 2020
url https://hdl.handle.net/10356/141413
_version_ 1681056560554442752