Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem

This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library...

Full description

Saved in:
Bibliographic Details
Main Author: SAAVEDRA, MIGUEL ZENON NICANOR
Format: text
Published: Archīum Ateneo 2018
Subjects:
Online Access:https://archium.ateneo.edu/theses-dissertations/220
http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
id ph-ateneo-arc.theses-dissertations-1219
record_format eprints
spelling ph-ateneo-arc.theses-dissertations-12192021-03-21T13:36:02Z Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem SAAVEDRA, MIGUEL ZENON NICANOR This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library both withand without logs. The comparison was conducted with four key performance indicators: preprocessing, storage efficiency, data retention, and query response time. Because the original hadoop-pcap framework failed to process larger datasets, its version with logs suppressed was instead used for the evaluation. Results show that Parquet outperforms hcap by 90% and hadoop-pcap with its logs suppressed by 96% in terms of query response time while text also runs 80% faster than hcap and 92% faster than hadoop-pcap with its logs suppressed, however, it also runs 30% slower in scan and aggregate queries and 70% and 40% slower in joins and aggregate-joins respectively when compared to Parquet. The framework created in this study not only provided an improved method for parsing PCAP binaries on Hadoop, outperforming hadoop-pcap by at least 20%, it also provided analternative technique for conversion to Parquet, reducing preprocessing time by a factor of 5. 2018-01-01T08:00:00Z text https://archium.ateneo.edu/theses-dissertations/220 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab Theses and Dissertations (All) Archīum Ateneo Apache Hadoop Computer networks Information networks Electronic data processing Packet transport networks.
institution Ateneo De Manila University
building Ateneo De Manila University Library
continent Asia
country Philippines
Philippines
content_provider Ateneo De Manila University Library
collection archium.Ateneo Institutional Repository
topic Apache Hadoop
Computer networks
Information networks
Electronic data processing
Packet transport networks.
spellingShingle Apache Hadoop
Computer networks
Information networks
Electronic data processing
Packet transport networks.
SAAVEDRA, MIGUEL ZENON NICANOR
Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
description This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library both withand without logs. The comparison was conducted with four key performance indicators: preprocessing, storage efficiency, data retention, and query response time. Because the original hadoop-pcap framework failed to process larger datasets, its version with logs suppressed was instead used for the evaluation. Results show that Parquet outperforms hcap by 90% and hadoop-pcap with its logs suppressed by 96% in terms of query response time while text also runs 80% faster than hcap and 92% faster than hadoop-pcap with its logs suppressed, however, it also runs 30% slower in scan and aggregate queries and 70% and 40% slower in joins and aggregate-joins respectively when compared to Parquet. The framework created in this study not only provided an improved method for parsing PCAP binaries on Hadoop, outperforming hadoop-pcap by at least 20%, it also provided analternative technique for conversion to Parquet, reducing preprocessing time by a factor of 5.
format text
author SAAVEDRA, MIGUEL ZENON NICANOR
author_facet SAAVEDRA, MIGUEL ZENON NICANOR
author_sort SAAVEDRA, MIGUEL ZENON NICANOR
title Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
title_short Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
title_full Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
title_fullStr Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
title_full_unstemmed Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
title_sort developing a framework for horizontally scalable network flow analytics on the hadoop ecosytem
publisher Archīum Ateneo
publishDate 2018
url https://archium.ateneo.edu/theses-dissertations/220
http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab
_version_ 1695734697991077888