Epsilon, a cluster scheduler for Kubernetes clusters

The adoption of shared computer clusters for executing high-performance computing workloads has allowed many organizations with limited financial capabilities to access computing power that otherwise might be too costly for them to build. However, to support these heterogeneous workloads, cluster sc...

Full description

Saved in:
Bibliographic Details
Main Author: Neo, Alex Jing Hui
Other Authors: Lee Bu Sung, Francis
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/147629
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-147629
record_format dspace
spelling sg-ntu-dr.10356-1476292021-04-08T13:08:13Z Epsilon, a cluster scheduler for Kubernetes clusters Neo, Alex Jing Hui Lee Bu Sung, Francis School of Computer Science and Engineering Singapore Advanced Research and Education Network Asi@Connect Project, TEIN EBSLEE@ntu.edu.sg Engineering::Computer science and engineering::Computer applications Engineering::Computer science and engineering::Information systems The adoption of shared computer clusters for executing high-performance computing workloads has allowed many organizations with limited financial capabilities to access computing power that otherwise might be too costly for them to build. However, to support these heterogeneous workloads, cluster schedulers are getting increasingly complex to develop and maintain as features are added based on different workload requirements. Kubernetes is a container orchestration platform designed to simplify the deployment of containers in a computer cluster. Kubernetes provides a monolithic cluster scheduler, which is responsible for allocating resources to containers. The author developed Epsilon as a cluster scheduler for Kubernetes using the microservices model as a foundation. In partnership with AsiaConnect, Epsilon's goal is to act as a scheduler and a starting point for examining the viability of using microservices to create a scheduler that helps developers implement updates quicker and support for heterogeneous workloads. Being a microservices-based scheduler, Epsilon is build using multiple microservices with strict service boundaries and are kept simple in design to prevent increasing code complexity due to modifications or feature updates. Epsilon contains multiple microservices for different functionalities, with some microservices making up the core system and the remaining as supporting features. The core microservices consist of the Coordinator, Scheduler, and Queue microservices which are responsible for the monitoring of new pods, scheduling of new pods, and communication between microservices respectively. Support microservices include the Autoscaler and Retry microservices which are responsible for automated scaling of scheduler services and rescheduling of failed pods respectively. One of Epsilon goals is to allow developers to commit changes quickly. Epsilon achieves this by splitting up the scheduler code into multiple smaller microservices. By spreading out the scheduler code in this way, developers can develop or update different components of the scheduler concurrently, reducing the time taken to commit the changes. Epsilon’s distributed nature also provide an opportunity to scale in or out the scheduler microservices to improve performance and resiliency due to having multiple identical copies of the scheduler microservices operating concurrently. Epsilon was deployed and tested on a 54 node Kubernetes cluster using Amazon Web Services EC2 instances. To improve the scheduler’s resiliency, Epsilon was deployed with 3 scheduler microservices. As multiple schedulers are making scheduling decisions concurrently, randomization is used as a mitigation technique to reduce the occurrence of scheduler conflicts between all 3 scheduler microservices. The experiments conducted included various tests related to the the scheduler’s performance, load balancing, and support for heterogeneous workloads. In one of the experiments, the scheduling performance of Epsilon was compared to the default Kubernetes scheduler. The experiment involves recording the time taken for each scheduler to schedule different amount of pods. Bachelor of Engineering (Computer Engineering) 2021-04-08T13:06:49Z 2021-04-08T13:06:49Z 2021 Final Year Project (FYP) Neo, A. J. H. (2021). Epsilon, a cluster scheduler for Kubernetes clusters. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/147629 https://hdl.handle.net/10356/147629 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computer applications
Engineering::Computer science and engineering::Information systems
spellingShingle Engineering::Computer science and engineering::Computer applications
Engineering::Computer science and engineering::Information systems
Neo, Alex Jing Hui
Epsilon, a cluster scheduler for Kubernetes clusters
description The adoption of shared computer clusters for executing high-performance computing workloads has allowed many organizations with limited financial capabilities to access computing power that otherwise might be too costly for them to build. However, to support these heterogeneous workloads, cluster schedulers are getting increasingly complex to develop and maintain as features are added based on different workload requirements. Kubernetes is a container orchestration platform designed to simplify the deployment of containers in a computer cluster. Kubernetes provides a monolithic cluster scheduler, which is responsible for allocating resources to containers. The author developed Epsilon as a cluster scheduler for Kubernetes using the microservices model as a foundation. In partnership with AsiaConnect, Epsilon's goal is to act as a scheduler and a starting point for examining the viability of using microservices to create a scheduler that helps developers implement updates quicker and support for heterogeneous workloads. Being a microservices-based scheduler, Epsilon is build using multiple microservices with strict service boundaries and are kept simple in design to prevent increasing code complexity due to modifications or feature updates. Epsilon contains multiple microservices for different functionalities, with some microservices making up the core system and the remaining as supporting features. The core microservices consist of the Coordinator, Scheduler, and Queue microservices which are responsible for the monitoring of new pods, scheduling of new pods, and communication between microservices respectively. Support microservices include the Autoscaler and Retry microservices which are responsible for automated scaling of scheduler services and rescheduling of failed pods respectively. One of Epsilon goals is to allow developers to commit changes quickly. Epsilon achieves this by splitting up the scheduler code into multiple smaller microservices. By spreading out the scheduler code in this way, developers can develop or update different components of the scheduler concurrently, reducing the time taken to commit the changes. Epsilon’s distributed nature also provide an opportunity to scale in or out the scheduler microservices to improve performance and resiliency due to having multiple identical copies of the scheduler microservices operating concurrently. Epsilon was deployed and tested on a 54 node Kubernetes cluster using Amazon Web Services EC2 instances. To improve the scheduler’s resiliency, Epsilon was deployed with 3 scheduler microservices. As multiple schedulers are making scheduling decisions concurrently, randomization is used as a mitigation technique to reduce the occurrence of scheduler conflicts between all 3 scheduler microservices. The experiments conducted included various tests related to the the scheduler’s performance, load balancing, and support for heterogeneous workloads. In one of the experiments, the scheduling performance of Epsilon was compared to the default Kubernetes scheduler. The experiment involves recording the time taken for each scheduler to schedule different amount of pods.
author2 Lee Bu Sung, Francis
author_facet Lee Bu Sung, Francis
Neo, Alex Jing Hui
format Final Year Project
author Neo, Alex Jing Hui
author_sort Neo, Alex Jing Hui
title Epsilon, a cluster scheduler for Kubernetes clusters
title_short Epsilon, a cluster scheduler for Kubernetes clusters
title_full Epsilon, a cluster scheduler for Kubernetes clusters
title_fullStr Epsilon, a cluster scheduler for Kubernetes clusters
title_full_unstemmed Epsilon, a cluster scheduler for Kubernetes clusters
title_sort epsilon, a cluster scheduler for kubernetes clusters
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/147629
_version_ 1696984372782039040