Event extraction for cybersecurity using large language models

This project studies and compares the efficiency of different Large Language Models (LLMs) for the extraction of cybersecurity events. Cybersecurity event extraction is a critical task in Cyber Threat Intelligence, it is aimed at identifying and categorizing incidents such as data breaches, malware...

Full description

Saved in:

Bibliographic Details
Main Author:	Seah, Kai Heng
Other Authors:	Hui Siu Cheung
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science
Online Access:	https://hdl.handle.net/10356/181182
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-181182
record_format	dspace
spelling	sg-ntu-dr.10356-1811822024-11-28T08:39:05Z Event extraction for cybersecurity using large language models Seah, Kai Heng Hui Siu Cheung College of Computing and Data Science ASSCHUI@ntu.edu.sg Computer and Information Science This project studies and compares the efficiency of different Large Language Models (LLMs) for the extraction of cybersecurity events. Cybersecurity event extraction is a critical task in Cyber Threat Intelligence, it is aimed at identifying and categorizing incidents such as data breaches, malware attacks, and vulnerabilities from unstructured text sources like news articles, threat reports, and social media. Traditional methods for cybersecurity event extraction often rely on rule-based systems or supervised machine learning models, which require extensive labelled data and are limited in adaptability. The nature of cybersecurity is that it is ever changing. One method of acquiring Cyber Threat Intelligence is through Open-Source Intelligence, where articles across the web are sourced and analysed. As LLMs have a good understanding of semantics and context, it is possible to leverage on LLMs for Cybersecurity event extraction. In this study, the focus will be on the conversational LLMs that many are familiar with, such as ChatGPT3.5, ChatGPT-4, LLAMA and Cohere. We investigate the efficacy of these conversational LLMs in extracting Cybersecurity events without further fine tuning but with the help of prompting techniques as well as Retrieval Augmented Generation. The effectiveness of our approach is evaluated through experiments on the CASIE dataset, comparing the performance of the different LLMs over zero shot, prompting techniques and retrieval augmented generation. The results demonstrate that the current state of base LLMs is unable to fulfil the task of Cybersecurity Event Extraction. Bachelor's degree 2024-11-28T08:39:05Z 2024-11-28T08:39:05Z 2024 Final Year Project (FYP) Seah, K. H. (2024). Event extraction for cybersecurity using large language models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/181182 https://hdl.handle.net/10356/181182 en application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science
spellingShingle	Computer and Information Science Seah, Kai Heng Event extraction for cybersecurity using large language models
description	This project studies and compares the efficiency of different Large Language Models (LLMs) for the extraction of cybersecurity events. Cybersecurity event extraction is a critical task in Cyber Threat Intelligence, it is aimed at identifying and categorizing incidents such as data breaches, malware attacks, and vulnerabilities from unstructured text sources like news articles, threat reports, and social media. Traditional methods for cybersecurity event extraction often rely on rule-based systems or supervised machine learning models, which require extensive labelled data and are limited in adaptability. The nature of cybersecurity is that it is ever changing. One method of acquiring Cyber Threat Intelligence is through Open-Source Intelligence, where articles across the web are sourced and analysed. As LLMs have a good understanding of semantics and context, it is possible to leverage on LLMs for Cybersecurity event extraction. In this study, the focus will be on the conversational LLMs that many are familiar with, such as ChatGPT3.5, ChatGPT-4, LLAMA and Cohere. We investigate the efficacy of these conversational LLMs in extracting Cybersecurity events without further fine tuning but with the help of prompting techniques as well as Retrieval Augmented Generation. The effectiveness of our approach is evaluated through experiments on the CASIE dataset, comparing the performance of the different LLMs over zero shot, prompting techniques and retrieval augmented generation. The results demonstrate that the current state of base LLMs is unable to fulfil the task of Cybersecurity Event Extraction.
author2	Hui Siu Cheung
author_facet	Hui Siu Cheung Seah, Kai Heng
format	Final Year Project
author	Seah, Kai Heng
author_sort	Seah, Kai Heng
title	Event extraction for cybersecurity using large language models
title_short	Event extraction for cybersecurity using large language models
title_full	Event extraction for cybersecurity using large language models
title_fullStr	Event extraction for cybersecurity using large language models
title_full_unstemmed	Event extraction for cybersecurity using large language models
title_sort	event extraction for cybersecurity using large language models
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/181182
_version_	1819112996718247936

Event extraction for cybersecurity using large language models

Similar Items