Malware detection in memory images using machine learning

With the increasing prevalence and sophistication of malware, there is an urgent need for effective and efficient methods to detect them. Memory forensics has shown promising results in finding malware that can elude traditional security measures. At the same time, machine learning techniques have p...

Full description

Saved in:
Bibliographic Details
Main Author: Neo, Guat Kwan
Other Authors: Luo Jun
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165974
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the increasing prevalence and sophistication of malware, there is an urgent need for effective and efficient methods to detect them. Memory forensics has shown promising results in finding malware that can elude traditional security measures. At the same time, machine learning techniques have proven to be effective in identifying unknown malware. By combining both approaches, a robust solution to malware detection can be developed. However, the effectiveness and practicality of these models depend heavily on the quality of the datasets they are trained on. This study aims to assess the effectiveness of machine learning models trained on the CIC-MalMem-2022 dataset for detecting malware in memory images. The study also aims to evaluate the generalisation ability of these models when presented with unseen data and investigate their potential for practical application. 6 classification models were trained and evaluated, and the results showed high scores across multiple metrics in cross-validation. However, when tested with a new set of unseen data, the models produced poor results, and investigation revealed potential issues with the training dataset. The study concluded that dataset quality and key factors, such as operating system versions, system environment variations, and oversampling techniques, are significant factors to consider when developing memory dump datasets for practical use. The study also contributed MemDumpGen, a tool for automating the execution of samples and generation of memory dumps, and MalMemDetector, a proof-of-concept tool that showcases how trained models could be utilised in a practical setting.