STUDY OF FRAMEWORK FOR DISTRIBUTED STREAMING GRAPH DATA PROCESSING IN SPARK

Distributed systems carried a role to fulfill a large amount of data processing quickly that cannot be done on a single system. Distributed data processing frameworks start to emerge, but they are still cannot meet the cases faced by each user, especially on the complex dataset structure such as da...

Full description

Saved in:
Bibliographic Details
Main Author: ALBERT TRI ADRIAN (NIM : 13513076), LIE
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/22899
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Distributed systems carried a role to fulfill a large amount of data processing quickly that cannot be done on a single system. Distributed data processing frameworks start to emerge, but they are still cannot meet the cases faced by each user, especially on the complex dataset structure such as data graph that has internode interdependence in a dataset. There needs to be knowledge of the aspects that play a role in optimal data processing in a case in a distributed system. One of the cases taken is the processing of data graphs streaming using one of the open source frameworks of Apache Spark. <br /> <br /> <br /> <br /> <br /> Based on this problem, a study has been conducted to find out what aspects are contributing to the performance of processing graph data streaming over Apache Spark. Aspects that are found are techniques of data partitioning, memory usage, use of streaming media, and the number of nodes used. Testing each of these aspects is done by building and running applications built on top of Apache Spark and can accept multiple configurations without deployment or recompilation. <br /> <br /> <br /> <br /> <br /> Based on the aspects found in this study and examples of applications used for testing. Users can use this knowledge to be applied in other cases, in terms of data <br /> <br /> <br /> <br /> <br /> and data processing, or use a distributed framework for processing data similar to or similar to Apache Spark