Real-time anomaly detection using clustering in big data technologies / Riyaz Ahamed Ariyaluran Habeeb

The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention amo...

Full description

Saved in:
Bibliographic Details
Main Author: Riyaz Ahamed , Ariyaluran Habeeb
Format: Thesis
Published: 2019
Subjects:
Online Access:http://studentsrepo.um.edu.my/13130/2/Riyaz_Ahmed.pdf
http://studentsrepo.um.edu.my/13130/1/Riyaz_Ahamed.pdf
http://studentsrepo.um.edu.my/13130/
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Malaya
Description
Summary:The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, critical reviews have identified that the existing approaches are inefficient in processing data to detect anomalies due to the amassment of massive volumes of data through the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this research attempted to address the issue of accuracy in anomalies detection in real time. To begin with, the existing state-of-the-art techniques related to anomaly detection, real-time big data technologies and machine learning algorithms have been critically reviewed to identify the problems. Subsequently, comparative analysis to further establish the problems has been carried out via utilization of various existing algorithms which were then validated using three openly available datasets. Based on the outcome of the analysis, this research proposed a novel framework namely real-time anomaly detection based on big data technologies (RTADBDT), along with supporting implementation algorithms. The framework comprises of BroIDS, Flume, Kafka, Spark Streaming, Spark MLlib, Matplot and HBase. The BroIDS processes the existing datasets and generates various log files such as HTTP which is used in this research while Flume component reads and tracks the incoming packet data blocks. Kafka comprises repository of messages, categorized into different topics, with each category further divided into numerous partitions comprising of well-arranged and absolute sequence of messages. Meanwhile, Spark Streaming effectively provides illustrious abstraction known as DStream, signifying an uninterrupted stream of data whereas Spark MLlib leverages algorithmic optimizations of MLlib and applies them in the proposed algorithms. Ultimately, the processed data has been visualised by using Matplot and stored via HBase. The proposed framework was validated to substantiate its efficacy particularly in terms of accuracy, memory consumption and execution time by performing critical comparative analysis using internal, external and statistical techniques. The performance of the proposed framework was assessed using mathematical expressions derived in this research and also by conducting comparative analysis. All the analysis has proven that the proposed framework’s technique has outperformed other existing techniques in terms of accuracy, memory consumption and execution time. The significance of this research can be attributed to wide spectrum in the body of knowledge, with the proposed framework serve as a backbone in real-time anomaly detection with increased accuracy, minimised memory consumption and shortened execution time. Furthermore, when implemented, this framework shall enable an organization to instantly detect anomaly in real-time while having potential for a more effective fault tolerance and scalability.