Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem

This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library...

Full description

Saved in:
Bibliographic Details
Main Author: SAAVEDRA, MIGUEL ZENON NICANOR
Format: text
Published: Archīum Ateneo 2018
Subjects:
Online Access:https://archium.ateneo.edu/theses-dissertations/220
http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
Description
Summary:This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library both withand without logs. The comparison was conducted with four key performance indicators: preprocessing, storage efficiency, data retention, and query response time. Because the original hadoop-pcap framework failed to process larger datasets, its version with logs suppressed was instead used for the evaluation. Results show that Parquet outperforms hcap by 90% and hadoop-pcap with its logs suppressed by 96% in terms of query response time while text also runs 80% faster than hcap and 92% faster than hadoop-pcap with its logs suppressed, however, it also runs 30% slower in scan and aggregate queries and 70% and 40% slower in joins and aggregate-joins respectively when compared to Parquet. The framework created in this study not only provided an improved method for parsing PCAP binaries on Hadoop, outperforming hadoop-pcap by at least 20%, it also provided analternative technique for conversion to Parquet, reducing preprocessing time by a factor of 5.