Towards Large Scale Packet Capture and Network Flow Analysis on Hadoop

Network traffic continues to grow yearly at a compounded rate. However, network traffic is still being analyzed on vertically scaled machines that do not scale as well as distributed computing platforms. Hadoop's horizontally scalable ecosystem provides a better environment for processing these...

Full description

Saved in:
Bibliographic Details
Main Authors: Saavedra, Miguel Zenon Nicanor L, Yu, William Emmanuel S
Format: text
Published: Archīum Ateneo 2018
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/291
https://ieeexplore.ieee.org/document/8590897
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:Network traffic continues to grow yearly at a compounded rate. However, network traffic is still being analyzed on vertically scaled machines that do not scale as well as distributed computing platforms. Hadoop's horizontally scalable ecosystem provides a better environment for processing these network captures stored in packet capture (PCAP) files. This paper proposes a framework called hcap for analyzing PCAPs on Hadoop inspired by the Rseaux IP Europens' (RIPE's) existing hadoop-pcap library but built completely from the ground up. The hcap framework improves several aspects of the hadoop-pcap library, namely protocol, error, and log handling. Results show that, while other methods still outperform hcap, it not only performs better than hadoop-pcap by 15% in scan queries and 18% in join queries, but it's more tolerant to broken PCAP entries which reduces preprocessing time and data loss, while also speeding up the conversion process used in other methods by 85%.