Machine learning techniques for advanced cyber attack detection

With the development of information communication technologies (ICT), more and more data is generated, processed, and transmitted among different smart components and organizations. ICT brings convenience and opportunities to humans and society, but at the same time, the resulting security-critical...

Full description

Saved in:
Bibliographic Details
Main Author: Yang, Wenzhuo
Other Authors: Lam Kwok Yan
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/161429
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-161429
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Yang, Wenzhuo
Machine learning techniques for advanced cyber attack detection
description With the development of information communication technologies (ICT), more and more data is generated, processed, and transmitted among different smart components and organizations. ICT brings convenience and opportunities to humans and society, but at the same time, the resulting security-critical and privacy-sensitive data attracts more attackers and increases the likelihood of security incidents. Therefore, studying effective and practical techniques against cyber attacks and maintaining cybersecurity in the big data society become increasingly significant for enhancing the confidentiality, integrity, and availability of user data in cyberspace. Cybersecurity puts a lot of emphasis on detection, reaction, and protection measures. As one of the key steps to defend against cyber attacks, cyber attack detection plays a critical role in cybersecurity posture supervision and threat warning. We mainly focus on investigating suitable machine learning (ML) techniques to construct advanced cyber attack detection systems in this thesis. Specifically, we focus on exploring promising ML techniques to design effective intrusion detection systems (IDS) and efficient cyber threat intelligence (CTI) analysis models to realize proactive defense to cyber attacks. As one of the most significant cybersecurity detection tools, IDS can identify anomalous activities based on internal system data, reducing financial and reputational losses caused by cyber attacks. Many ML techniques have been utilized to automate the intrusion detection process. However, most existing ML-based IDSs suffer practical issues in real industrial circumstances. Problems include the high cost of acquiring fully correctly labeled (FCL) data under the challenge of big data and unsatisfactory detection accuracy for minority attacks in imbalanced data. Therefore, we explore the possibility of training IDS by weakly supervised learning (WSL) approaches using weak labels (incomplete, inexact, or possibly inaccurate labels) to mitigate the data annotation pressure and data privacy issues that the traditional ML-IDS may face. WSL is a special ML paradigm and weak labels are imperfect, high-level annotation information which is easier and cheaper to obtain than FCL data in reality. First, we utilize a promising WSL technique, unlabeled-unlabeled learning (UUL), to train IDS for identifying benign and malicious network traffic by inexact information labeled data. Then, we investigate the feasibility of using another WSL archetype, partial label learning (PLL) to build IDS by ambiguously labeled data. Several different PLL techniques are leveraged and various data resampling algorithms are combined with the proposed IDS model to detect specific attacks and improve the detection performance for minority attacks in imbalanced data. CTI analysis is another promising method that enables security experts to grasp emerging threat trends based on external sources and provide targeted users with early warnings to take proactive countermeasures to detect and against potential cyber attacks. As cyber attacks are increasingly sophisticated and menacing, it becomes a global trend to share and analyze CTI between different security departments. More CTI reports generation and frequent CTI sharing lead to data redundancy problems and cause an urgent need for much higher analysis efficiency capacity. Lacking professional security analysts and the increasing capability of capturing network information in the big data society are another two challenges. Facing the above problems, we want to speed up the CTI analysis process and automate the CTI reports classification through data mining and machine learning techniques. Hence, our third work presents a practical and efficient approach for gathering large quantities of CTI sources, embedding, and grouping the CTI reports by unsupervised text representation algorithms jointly with six ML classifiers to automate the CTI analysis process. In conclusion, we leverage imperfect label trained ML techniques for internal network intrusion detection and use generic feature representation tools jointly with different ML classifiers for external CTI data analysis to enhance cybersecurity. Extensive experiments show the feasibility and effectiveness of the proposed methods for advanced cyber attack detection.
author2 Lam Kwok Yan
author_facet Lam Kwok Yan
Yang, Wenzhuo
format Thesis-Doctor of Philosophy
author Yang, Wenzhuo
author_sort Yang, Wenzhuo
title Machine learning techniques for advanced cyber attack detection
title_short Machine learning techniques for advanced cyber attack detection
title_full Machine learning techniques for advanced cyber attack detection
title_fullStr Machine learning techniques for advanced cyber attack detection
title_full_unstemmed Machine learning techniques for advanced cyber attack detection
title_sort machine learning techniques for advanced cyber attack detection
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/161429
_version_ 1746219671891214336
spelling sg-ntu-dr.10356-1614292022-10-04T01:04:34Z Machine learning techniques for advanced cyber attack detection Yang, Wenzhuo Lam Kwok Yan School of Computer Science and Engineering kwokyan.lam@ntu.edu.sg Engineering::Computer science and engineering With the development of information communication technologies (ICT), more and more data is generated, processed, and transmitted among different smart components and organizations. ICT brings convenience and opportunities to humans and society, but at the same time, the resulting security-critical and privacy-sensitive data attracts more attackers and increases the likelihood of security incidents. Therefore, studying effective and practical techniques against cyber attacks and maintaining cybersecurity in the big data society become increasingly significant for enhancing the confidentiality, integrity, and availability of user data in cyberspace. Cybersecurity puts a lot of emphasis on detection, reaction, and protection measures. As one of the key steps to defend against cyber attacks, cyber attack detection plays a critical role in cybersecurity posture supervision and threat warning. We mainly focus on investigating suitable machine learning (ML) techniques to construct advanced cyber attack detection systems in this thesis. Specifically, we focus on exploring promising ML techniques to design effective intrusion detection systems (IDS) and efficient cyber threat intelligence (CTI) analysis models to realize proactive defense to cyber attacks. As one of the most significant cybersecurity detection tools, IDS can identify anomalous activities based on internal system data, reducing financial and reputational losses caused by cyber attacks. Many ML techniques have been utilized to automate the intrusion detection process. However, most existing ML-based IDSs suffer practical issues in real industrial circumstances. Problems include the high cost of acquiring fully correctly labeled (FCL) data under the challenge of big data and unsatisfactory detection accuracy for minority attacks in imbalanced data. Therefore, we explore the possibility of training IDS by weakly supervised learning (WSL) approaches using weak labels (incomplete, inexact, or possibly inaccurate labels) to mitigate the data annotation pressure and data privacy issues that the traditional ML-IDS may face. WSL is a special ML paradigm and weak labels are imperfect, high-level annotation information which is easier and cheaper to obtain than FCL data in reality. First, we utilize a promising WSL technique, unlabeled-unlabeled learning (UUL), to train IDS for identifying benign and malicious network traffic by inexact information labeled data. Then, we investigate the feasibility of using another WSL archetype, partial label learning (PLL) to build IDS by ambiguously labeled data. Several different PLL techniques are leveraged and various data resampling algorithms are combined with the proposed IDS model to detect specific attacks and improve the detection performance for minority attacks in imbalanced data. CTI analysis is another promising method that enables security experts to grasp emerging threat trends based on external sources and provide targeted users with early warnings to take proactive countermeasures to detect and against potential cyber attacks. As cyber attacks are increasingly sophisticated and menacing, it becomes a global trend to share and analyze CTI between different security departments. More CTI reports generation and frequent CTI sharing lead to data redundancy problems and cause an urgent need for much higher analysis efficiency capacity. Lacking professional security analysts and the increasing capability of capturing network information in the big data society are another two challenges. Facing the above problems, we want to speed up the CTI analysis process and automate the CTI reports classification through data mining and machine learning techniques. Hence, our third work presents a practical and efficient approach for gathering large quantities of CTI sources, embedding, and grouping the CTI reports by unsupervised text representation algorithms jointly with six ML classifiers to automate the CTI analysis process. In conclusion, we leverage imperfect label trained ML techniques for internal network intrusion detection and use generic feature representation tools jointly with different ML classifiers for external CTI data analysis to enhance cybersecurity. Extensive experiments show the feasibility and effectiveness of the proposed methods for advanced cyber attack detection. Doctor of Philosophy 2022-09-01T01:36:29Z 2022-09-01T01:36:29Z 2022 Thesis-Doctor of Philosophy Yang, W. (2022). Machine learning techniques for advanced cyber attack detection. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/161429 https://hdl.handle.net/10356/161429 10.32657/10356/161429 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University