AVAILABILITY OF STATELESS WORKLOAD ON KUBERNETES ON GOOGLE CLOUD PLATFORM WITH PREEMPTIBLE NODES

Nowadays, containers are one of the most common ways to deploy applications, especially in a microservice architecture. However, managing a lot of containers is not an easy task. Container orchestrator like Kubernetes is designed to help in managing containers. To make the...

Full description

Saved in:
Bibliographic Details
Main Author: Christo Randiny, Joshua
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/55857
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Nowadays, containers are one of the most common ways to deploy applications, especially in a microservice architecture. However, managing a lot of containers is not an easy task. Container orchestrator like Kubernetes is designed to help in managing containers. To make the maintenance and setup of Kubernetes easy, many cloud providers provide managed Kubernetes services such as Google Kubernetes Engine by Google. One of the ways Google helps in saving cost is by providing the options for using preemptible virtual machines. Preemptible virtual machines can be used as the node for Kubernetes resulting in preemptible nodes. Preemptible nodes are much cheaper than regular nodes. However, it comes with some drawbacks that can affect the availability of workload running in the cluster if not handled properly. This paper will implement two methods to help improve the availability of the cluster. The methods are graceful shutdown and node scheduling. Graceful shutdown helps increase availability by making sure traffic is not directed to pods that are about to be shutdown using centrally controlled readiness probes. Node scheduling helps to increase availability by reducing the chances of multiple nodes shutdown at the same time. Node scheduling works by proactively shutting down nodes randomly before 24 hours have passed since the node creation. From testing results, the system created has successfully reduced failed requests by up to 65%. Testing also shows that node scheduling reduces availability by a small amount on normal conditions. However, the tradeoff is needed to reduce the chances of catastrophic failure when many nodes are shut down at the same time.