Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem
This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library...
Saved in:
Main Author: | |
---|---|
Format: | text |
Published: |
Archīum Ateneo
2018
|
Subjects: | |
Online Access: | https://archium.ateneo.edu/theses-dissertations/220 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Ateneo De Manila University |
id |
ph-ateneo-arc.theses-dissertations-1219 |
---|---|
record_format |
eprints |
spelling |
ph-ateneo-arc.theses-dissertations-12192021-03-21T13:36:02Z Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem SAAVEDRA, MIGUEL ZENON NICANOR This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library both withand without logs. The comparison was conducted with four key performance indicators: preprocessing, storage efficiency, data retention, and query response time. Because the original hadoop-pcap framework failed to process larger datasets, its version with logs suppressed was instead used for the evaluation. Results show that Parquet outperforms hcap by 90% and hadoop-pcap with its logs suppressed by 96% in terms of query response time while text also runs 80% faster than hcap and 92% faster than hadoop-pcap with its logs suppressed, however, it also runs 30% slower in scan and aggregate queries and 70% and 40% slower in joins and aggregate-joins respectively when compared to Parquet. The framework created in this study not only provided an improved method for parsing PCAP binaries on Hadoop, outperforming hadoop-pcap by at least 20%, it also provided analternative technique for conversion to Parquet, reducing preprocessing time by a factor of 5. 2018-01-01T08:00:00Z text https://archium.ateneo.edu/theses-dissertations/220 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab Theses and Dissertations (All) Archīum Ateneo Apache Hadoop Computer networks Information networks Electronic data processing Packet transport networks. |
institution |
Ateneo De Manila University |
building |
Ateneo De Manila University Library |
continent |
Asia |
country |
Philippines Philippines |
content_provider |
Ateneo De Manila University Library |
collection |
archium.Ateneo Institutional Repository |
topic |
Apache Hadoop Computer networks Information networks Electronic data processing Packet transport networks. |
spellingShingle |
Apache Hadoop Computer networks Information networks Electronic data processing Packet transport networks. SAAVEDRA, MIGUEL ZENON NICANOR Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem |
description |
This study proposes an improved way of analyzing raw network data on Hadoop called hcap. This new framework is evaluated against three common methods currently used for this type of analytics; conversion to text, conversion to Parquet, and direct parsing of PCAP binaries with the hadoop-pcap library both withand without logs. The comparison was conducted with four key performance indicators: preprocessing, storage efficiency, data retention, and query response time. Because the original hadoop-pcap framework failed to process larger datasets, its version with logs suppressed was instead used for the evaluation. Results show that Parquet outperforms hcap by 90% and hadoop-pcap with its logs suppressed by 96% in terms of query response time while text also runs 80% faster than hcap and 92% faster than hadoop-pcap with its logs suppressed, however, it also runs 30% slower in scan and aggregate queries and 70% and 40% slower in joins and aggregate-joins respectively when compared to Parquet. The framework created in this study not only provided an improved method for parsing PCAP binaries on Hadoop, outperforming hadoop-pcap by at least 20%, it also provided analternative technique for conversion to Parquet, reducing preprocessing time by a factor of 5. |
format |
text |
author |
SAAVEDRA, MIGUEL ZENON NICANOR |
author_facet |
SAAVEDRA, MIGUEL ZENON NICANOR |
author_sort |
SAAVEDRA, MIGUEL ZENON NICANOR |
title |
Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem |
title_short |
Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem |
title_full |
Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem |
title_fullStr |
Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem |
title_full_unstemmed |
Developing a framework for horizontally scalable network flow analytics on the Hadoop ecosytem |
title_sort |
developing a framework for horizontally scalable network flow analytics on the hadoop ecosytem |
publisher |
Archīum Ateneo |
publishDate |
2018 |
url |
https://archium.ateneo.edu/theses-dissertations/220 http://rizalls.lib.admu.edu.ph/#section=resource&resourceid=1624771969&currentIndex=0&view=fullDetailsDetailsTab |
_version_ |
1695734697991077888 |