Event detection for cyber security news articles

In recent years, there has been an increasing focus on using text mining techniques in the field of cyber security. Extensive studies have been conducted to improve the techniques that allow computers to understand and process language in this domain. One important task in this field is event detect...

Full description

Saved in:
Bibliographic Details
Main Author: Huang, Jovan Tian Chun
Other Authors: Hui Siu Cheung
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/165914
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In recent years, there has been an increasing focus on using text mining techniques in the field of cyber security. Extensive studies have been conducted to improve the techniques that allow computers to understand and process language in this domain. One important task in this field is event detection, which involves identifying specific events or occurrences in text by using certain keywords or triggers. However, current methods often focus on understanding the text itself and do not pay enough attention to the meaning of the events being identified. In this study, we introduce a novel approach for cybersecurity event detection, referred to as the Label-Pivoting Model for Cybersecurity News Event Detection (LPCNED) model that is enhanced from the Semantic Pivoting Model for Effective Event Detection (SPEED) model. SPEED model demonstrated superior performance when compared to various robust baselines on benchmark datasets such as ACE 2005 for event detection. In LPCNED, we employed the pretrained NewsBERT language model to encode the combined representation of input sentences and labels. It employs the semantic meanings of predetermined event type labels to identify candidates for event triggers. The NewsBERT model provides domain-specific knowledge, drawing from popular news data sources, thereby enhancing the overall effectiveness of the model. Our experiments, conducted using the Cybersecurity Event Annotation Corpus (CEAC), the sole corpus available for cybersecurity news event extraction at the time of writing, demonstrate the robustness and efficacy of the LPCNED model despite limited data availability, even outperforming the BERT-CRF model used on the MAVEN dataset in the general domain and the SPEED model. These results indicate the potential utility of the LPCNED model for event detection in cybersecurity news articles and warrant further investigation in the task of event extraction.