Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library...
Saved in:
Main Author: | |
---|---|
Format: | text |
Published: |
Archīum Ateneo
2018
|
Subjects: | |
Online Access: | https://archium.ateneo.edu/theses-dissertations/220 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Ateneo De Manila University |
Summary: | This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library both withand without logs. The comparison was conducted with four key performance indicators: preprocessing, storage efficiency, data retention, and query response time. Because the original hadoop-pcap framework failed to process larger datasets, its version with logs suppressed was instead used for the evaluation. Results show that Parquet outperforms hcap by 90% and hadoop-pcap with its logs suppressed by 96% in terms of query response time while text also runs 80% faster than hcap and 92% faster than hadoop-pcap with its logs suppressed, however, it also runs 30% slower in scan and aggregate queries and 70% and 40% slower in joins and aggregate-joins respectively when compared to Parquet. The framework created in this study not only provided an improved method for parsing PCAP binaries on Hadoop, outperforming hadoop-pcap by at least 20%, it also provided analternative technique for conversion to Parquet, reducing preprocessing time by a factor of 5. |
---|