Event extraction for cybersecurity using large language models

This project studies and compares the efficiency of different Large Language Models (LLMs) for the extraction of cybersecurity events. Cybersecurity event extraction is a critical task in Cyber Threat Intelligence, it is aimed at identifying and categorizing incidents such as data breaches, malware...

Full description

Saved in:
Bibliographic Details
Main Author: Seah, Kai Heng
Other Authors: Hui Siu Cheung
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/181182
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-181182
record_format dspace
spelling sg-ntu-dr.10356-1811822024-11-28T08:39:05Z Event extraction for cybersecurity using large language models Seah, Kai Heng Hui Siu Cheung College of Computing and Data Science ASSCHUI@ntu.edu.sg Computer and Information Science This project studies and compares the efficiency of different Large Language Models (LLMs) for the extraction of cybersecurity events. Cybersecurity event extraction is a critical task in Cyber Threat Intelligence, it is aimed at identifying and categorizing incidents such as data breaches, malware attacks, and vulnerabilities from unstructured text sources like news articles, threat reports, and social media. Traditional methods for cybersecurity event extraction often rely on rule-based systems or supervised machine learning models, which require extensive labelled data and are limited in adaptability. The nature of cybersecurity is that it is ever changing. One method of acquiring Cyber Threat Intelligence is through Open-Source Intelligence, where articles across the web are sourced and analysed. As LLMs have a good understanding of semantics and context, it is possible to leverage on LLMs for Cybersecurity event extraction. In this study, the focus will be on the conversational LLMs that many are familiar with, such as ChatGPT3.5, ChatGPT-4, LLAMA and Cohere. We investigate the efficacy of these conversational LLMs in extracting Cybersecurity events without further fine tuning but with the help of prompting techniques as well as Retrieval Augmented Generation. The effectiveness of our approach is evaluated through experiments on the CASIE dataset, comparing the performance of the different LLMs over zero shot, prompting techniques and retrieval augmented generation. The results demonstrate that the current state of base LLMs is unable to fulfil the task of Cybersecurity Event Extraction. Bachelor's degree 2024-11-28T08:39:05Z 2024-11-28T08:39:05Z 2024 Final Year Project (FYP) Seah, K. H. (2024). Event extraction for cybersecurity using large language models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181182 https://hdl.handle.net/10356/181182 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
spellingShingle Computer and Information Science
Seah, Kai Heng
Event extraction for cybersecurity using large language models
description This project studies and compares the efficiency of different Large Language Models (LLMs) for the extraction of cybersecurity events. Cybersecurity event extraction is a critical task in Cyber Threat Intelligence, it is aimed at identifying and categorizing incidents such as data breaches, malware attacks, and vulnerabilities from unstructured text sources like news articles, threat reports, and social media. Traditional methods for cybersecurity event extraction often rely on rule-based systems or supervised machine learning models, which require extensive labelled data and are limited in adaptability. The nature of cybersecurity is that it is ever changing. One method of acquiring Cyber Threat Intelligence is through Open-Source Intelligence, where articles across the web are sourced and analysed. As LLMs have a good understanding of semantics and context, it is possible to leverage on LLMs for Cybersecurity event extraction. In this study, the focus will be on the conversational LLMs that many are familiar with, such as ChatGPT3.5, ChatGPT-4, LLAMA and Cohere. We investigate the efficacy of these conversational LLMs in extracting Cybersecurity events without further fine tuning but with the help of prompting techniques as well as Retrieval Augmented Generation. The effectiveness of our approach is evaluated through experiments on the CASIE dataset, comparing the performance of the different LLMs over zero shot, prompting techniques and retrieval augmented generation. The results demonstrate that the current state of base LLMs is unable to fulfil the task of Cybersecurity Event Extraction.
author2 Hui Siu Cheung
author_facet Hui Siu Cheung
Seah, Kai Heng
format Final Year Project
author Seah, Kai Heng
author_sort Seah, Kai Heng
title Event extraction for cybersecurity using large language models
title_short Event extraction for cybersecurity using large language models
title_full Event extraction for cybersecurity using large language models
title_fullStr Event extraction for cybersecurity using large language models
title_full_unstemmed Event extraction for cybersecurity using large language models
title_sort event extraction for cybersecurity using large language models
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/181182
_version_ 1819112996718247936