DEVELOPMENT OF DYNAMIC RESOURCES SCHEDULING ON APACHE SPARK ON TOP OF KUBERNETES

Apache Spark is a large-scale distributed data processing framework. Distributed processing in Apache Spark is by using several computers that form a cluster. With distributed processing, there is one advantage that is scalable processing. With Kubernetes technology, distributed processing is easier...

Full description

Saved in:
Bibliographic Details
Main Author: Nugroho, Fajar
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/43641
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Apache Spark is a large-scale distributed data processing framework. Distributed processing in Apache Spark is by using several computers that form a cluster. With distributed processing, there is one advantage that is scalable processing. With Kubernetes technology, distributed processing is easier. Apache Spark has supported running on Kubernetes. However, the allocation of computing resources used is still static. Static allocation of resources has several limitations, including the lack of optimal system utilization, and the number of executors who cannot adjust existing resources. For this reason, it is necessary to develop so that the allocation of Apache Spark computational resources over Kubernetes can be dynamic so as to increase system utilization while increasing Apache Spark performance. The development of dynamic resource allocation can be implemented and produce quite diverse performance. From the results of research conducted, running Apache Spark on top of the Governor with dynamic resource allocation performance is lower than the static resource allocation. But for cases where the executor pod often fails, Apache Spark on top of the Govern or with dynamic resource allocation has better performance