A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-by...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/174534 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-174534 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1745342024-04-05T15:40:27Z A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings Liu, Wenyang Wang, Yi Wu, Kejun Yap, Kim-Hui Chau, Lap-Pui School of Electrical and Electronic Engineering 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Computer and Information Science Convolutional Neural Networks File fragment classification Byte2image Memory forensics File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on the FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. National Research Foundation (NRF) Submitted/Accepted version This research/project is supported by the National Research Foundation, Singapore, and Cyber Security Agency of Singapore under its National Cybersecurity R&D Programme (NRF2018NCR-NCR009-0001). 2024-04-02T05:32:37Z 2024-04-02T05:32:37Z 2023 Conference Paper Liu, W., Wang, Y., Wu, K., Yap, K. & Chau, L. (2023). A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings. 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS). https://dx.doi.org/10.1109/AICAS57966.2023.10168636 2834-9857 https://hdl.handle.net/10356/174534 10.1109/AICAS57966.2023.10168636 en NRF2018NCR-NCR009-0001 © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/AICAS57966.2023.10168636. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Convolutional Neural Networks File fragment classification Byte2image Memory forensics |
spellingShingle |
Computer and Information Science Convolutional Neural Networks File fragment classification Byte2image Memory forensics Liu, Wenyang Wang, Yi Wu, Kejun Yap, Kim-Hui Chau, Lap-Pui A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings |
description |
File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on the FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Liu, Wenyang Wang, Yi Wu, Kejun Yap, Kim-Hui Chau, Lap-Pui |
format |
Conference or Workshop Item |
author |
Liu, Wenyang Wang, Yi Wu, Kejun Yap, Kim-Hui Chau, Lap-Pui |
author_sort |
Liu, Wenyang |
title |
A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings |
title_short |
A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings |
title_full |
A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings |
title_fullStr |
A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings |
title_full_unstemmed |
A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings |
title_sort |
byte sequence is worth an image: cnn for file fragment classification using bit shift and n-gram embeddings |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/174534 |
_version_ |
1814047300959338496 |