Provenance graph generation for intrusion detection

In this digital age, cyberattacks are becoming more complex, and are accompanied by increasingly severe consequences. Traditional intrusion detection systems are struggling to identify sophisticated threats such as zero-day attacks or Advanced Persistent Threats (APTs) efficiently and effectively. T...

Full description

Saved in:
Bibliographic Details
Main Author: Chew, Perlyn Jie Ying
Other Authors: Ke Yiping, Kelly
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171750
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In this digital age, cyberattacks are becoming more complex, and are accompanied by increasingly severe consequences. Traditional intrusion detection systems are struggling to identify sophisticated threats such as zero-day attacks or Advanced Persistent Threats (APTs) efficiently and effectively. To address this challenge, modern approaches are required. Provenance graphs emerge as a promising data source for modern intrusion detection by capturing comprehensive information on both malicious and benign system activities. Provenance describes the history or lineage of an object, and captures information on how digital objects arrive at their existing state. These graphs present complex dependencies and relationships in the form of a directed acyclic graph that has potential for analysis using machine learning methods. However, there are few end-to-end pipelines that automatically generate and transform provenance data into graph representations suitable for machine learning. The Flurry framework is a contemporary approach, built upon CamFlow, a provenance capture system, to improve the reproducibility and ease of generating provenance graphs for machine learning. Recognising the potential of provenance graphs and the challenges in their generation, this research aims to implement Flurry and improve the generation and capture of provenance graphs for intrusion detection. Intrusion scenarios will be designed then simulated on multiple security- sensitive applications across various operating systems. Extensive datasets of provenance graphs were produced via dynamically executing various attacks on Fedora and Ubuntu, then used to train and validate state-of-the-art graph-based models, to evaluate their effectiveness and accuracy. Specifically, the provenance graphs were seamlessly exported as a dataset for a Graph Convolution Network (GCN) in this project. The results affirm Flurry as an excellent framework for generating provenance graphs. Additionally, the strong performance of cutting-edge graph based models in tasks like graph classification and anomaly detection underscore the potential of provenance graphs as an ideal data source for contemporary intrusion detection systems.