A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings

File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-by...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu, Wenyang, Wang, Yi, Wu, Kejun, Yap, Kim-Hui, Chau, Lap-Pui
Other Authors: School of Electrical and Electronic Engineering
Format: Conference or Workshop Item
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/174534
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-174534
record_format dspace
spelling sg-ntu-dr.10356-1745342024-04-05T15:40:27Z A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings Liu, Wenyang Wang, Yi Wu, Kejun Yap, Kim-Hui Chau, Lap-Pui School of Electrical and Electronic Engineering 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS) Computer and Information Science Convolutional Neural Networks File fragment classification Byte2image Memory forensics File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on the FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios. National Research Foundation (NRF) Submitted/Accepted version This research/project is supported by the National Research Foundation, Singapore, and Cyber Security Agency of Singapore under its National Cybersecurity R&D Programme (NRF2018NCR-NCR009-0001). 2024-04-02T05:32:37Z 2024-04-02T05:32:37Z 2023 Conference Paper Liu, W., Wang, Y., Wu, K., Yap, K. & Chau, L. (2023). A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings. 2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS). https://dx.doi.org/10.1109/AICAS57966.2023.10168636 2834-9857 https://hdl.handle.net/10356/174534 10.1109/AICAS57966.2023.10168636 en NRF2018NCR-NCR009-0001 © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/AICAS57966.2023.10168636. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Convolutional Neural Networks
File fragment classification
Byte2image
Memory forensics
spellingShingle Computer and Information Science
Convolutional Neural Networks
File fragment classification
Byte2image
Memory forensics
Liu, Wenyang
Wang, Yi
Wu, Kejun
Yap, Kim-Hui
Chau, Lap-Pui
A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
description File fragment classification (FFC) on small chunks of memory is essential in memory forensics and Internet security. Existing methods mainly treat file fragments as 1d byte signals and utilize the captured inter-byte features for classification, while the bit information within bytes, i.e., intra-byte information, is seldom considered. This is inherently inapt for classifying variable-length coding files whose symbols are represented as the variable number of bits. Conversely, we propose Byte2Image, a novel data augmentation technique, to introduce the neglected intra-byte information into file fragments and re-treat them as 2d gray-scale images, which allows us to capture both inter-byte and intra-byte correlations simultaneously through powerful convolutional neural networks (CNNs). Specifically, to convert file fragments to 2d images, we employ a sliding byte window to expose the neglected intra-byte information and stack their n-gram features row by row. We further propose a byte sequence& image fusion network as a classifier, which can jointly model the raw 1d byte sequence and the converted 2d image to perform FFC. Experiments on the FFT-75 dataset validate that our proposed method can achieve notable accuracy improvements over state-of-the-art methods in nearly all scenarios.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Liu, Wenyang
Wang, Yi
Wu, Kejun
Yap, Kim-Hui
Chau, Lap-Pui
format Conference or Workshop Item
author Liu, Wenyang
Wang, Yi
Wu, Kejun
Yap, Kim-Hui
Chau, Lap-Pui
author_sort Liu, Wenyang
title A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
title_short A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
title_full A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
title_fullStr A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
title_full_unstemmed A byte sequence is worth an image: CNN for file fragment classification using bit shift and n-gram embeddings
title_sort byte sequence is worth an image: cnn for file fragment classification using bit shift and n-gram embeddings
publishDate 2024
url https://hdl.handle.net/10356/174534
_version_ 1814047300959338496