Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem

With the rapid and continuous growth of annual network traffic comes the need to develop systems that can efficiently scale to meet the demands of analyzing all this traffic data. The Hadoop ecosystem provides an environment that is capable of addressing this need, because of its horizontal scalabil...

Full description

Saved in:
Bibliographic Details
Main Authors: Medalla, Alberto H, Saavedra, Miguel Zenon Nicanor L, Abu, Patricia Angela R, Yu, William Emmanuel S
Format: text
Published: Archīum Ateneo 2018
Subjects:
Online Access:https://archium.ateneo.edu/discs-faculty-pubs/188
https://ieeexplore.ieee.org/document/8780880
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Ateneo De Manila University
id ph-ateneo-arc.discs-faculty-pubs-1187
record_format eprints
spelling ph-ateneo-arc.discs-faculty-pubs-11872020-07-08T07:25:00Z Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem Medalla, Alberto H Saavedra, Miguel Zenon Nicanor L Abu, Patricia Angela R Yu, William Emmanuel S With the rapid and continuous growth of annual network traffic comes the need to develop systems that can efficiently scale to meet the demands of analyzing all this traffic data. The Hadoop ecosystem provides an environment that is capable of addressing this need, because of its horizontal scalability and its data locality optimization feature. The latter feature improves parallel analysis of data by placing computing tasks within the same node that contains the block of data to be analyzed. However, this feature cannot be taken advantage of by those input formats that are not splittable within the Hadoop Distributed File System. The PCAP format used for capturing network data is one such file format. To address this issue, this paper proposes the inclusion of a minimal preprocessing step before PCAP files are fed into Hadoop and analyzed using the hcap framework, which is currently the fastest framework for analyzing PCAP data in Hadoop. This preprocessing step is designed to adapt the PCAP files into properly split blocks in order to take advantage of Hadoop's data locality optimization feature. Results show a significant improvement in query response time with a performance gain of 92%, 89%, 91%, and, 87% for scan, aggregate, join, and aggregate-join queries respectively when compared to the original hcap framework. 2018-01-01T08:00:00Z text https://archium.ateneo.edu/discs-faculty-pubs/188 https://ieeexplore.ieee.org/document/8780880 Department of Information Systems & Computer Science Faculty Publications Archīum Ateneo Hadoop network analytics big data PCAP low analytics Computer Sciences
institution Ateneo De Manila University
building Ateneo De Manila University Library
continent Asia
country Philippines
Philippines
content_provider Ateneo De Manila University Library
collection archium.Ateneo Institutional Repository
topic Hadoop
network analytics
big data
PCAP
low analytics
Computer Sciences
spellingShingle Hadoop
network analytics
big data
PCAP
low analytics
Computer Sciences
Medalla, Alberto H
Saavedra, Miguel Zenon Nicanor L
Abu, Patricia Angela R
Yu, William Emmanuel S
Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem
description With the rapid and continuous growth of annual network traffic comes the need to develop systems that can efficiently scale to meet the demands of analyzing all this traffic data. The Hadoop ecosystem provides an environment that is capable of addressing this need, because of its horizontal scalability and its data locality optimization feature. The latter feature improves parallel analysis of data by placing computing tasks within the same node that contains the block of data to be analyzed. However, this feature cannot be taken advantage of by those input formats that are not splittable within the Hadoop Distributed File System. The PCAP format used for capturing network data is one such file format. To address this issue, this paper proposes the inclusion of a minimal preprocessing step before PCAP files are fed into Hadoop and analyzed using the hcap framework, which is currently the fastest framework for analyzing PCAP data in Hadoop. This preprocessing step is designed to adapt the PCAP files into properly split blocks in order to take advantage of Hadoop's data locality optimization feature. Results show a significant improvement in query response time with a performance gain of 92%, 89%, 91%, and, 87% for scan, aggregate, join, and aggregate-join queries respectively when compared to the original hcap framework.
format text
author Medalla, Alberto H
Saavedra, Miguel Zenon Nicanor L
Abu, Patricia Angela R
Yu, William Emmanuel S
author_facet Medalla, Alberto H
Saavedra, Miguel Zenon Nicanor L
Abu, Patricia Angela R
Yu, William Emmanuel S
author_sort Medalla, Alberto H
title Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem
title_short Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem
title_full Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem
title_fullStr Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem
title_full_unstemmed Adapting Block-Sized Captures for Faster Network Flow Analysis on the Hadoop Ecosystem
title_sort adapting block-sized captures for faster network flow analysis on the hadoop ecosystem
publisher Archīum Ateneo
publishDate 2018
url https://archium.ateneo.edu/discs-faculty-pubs/188
https://ieeexplore.ieee.org/document/8780880
_version_ 1728621327250620416