IMPLEMENTATION OF CLOUD NATIVE INFRASTRUCTURE DESIGN FOR AI/ML EXPERIMENTS USING KUBERNETES
This research focuses on the design implementation of cloud-native infrastructure for AI/ML experiments using Kubernetes. The main goal of this research is to build an infrastructure that can support AI/ML experiments with high flexibility and scalability. Kubernetes was chosen as the container o...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/82270 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | This research focuses on the design implementation of cloud-native infrastructure
for AI/ML experiments using Kubernetes. The main goal of this research is to build
an infrastructure that can support AI/ML experiments with high flexibility and
scalability. Kubernetes was chosen as the container orchestration platform due to
its ability to manage dynamic and complex workloads.
The design implementation involves configuring a Kubernetes cluster consisting of
several nodes, including master and worker nodes equipped with GPUs and CPUs.
Additional configurations include a storage class for storage management, load
balancer settings for network traffic distribution, and a monitoring platform using
Prometheus and Grafana to monitor system performance. Kubeflow is also
integrated as the main framework to facilitate the management of AI/ML
experiments. This process ensures that the infrastructure can be operated and
optimized according to user needs.
Testing was conducted to evaluate the performance and efficiency of the built
infrastructure. Accessibility testing involved several usage scenarios with various
devices, including PCs, laptops, and phones. Additionally, resource usage testing
was carried out with various scenarios, involving multiple users accessing and
running AI/ML workloads with different configurations.
Analysis of the test results shows that the built cloud-native infrastructure has
several key advantages. The system not only supports dynamic scalability but also
improves resource usage efficiency. The use of container technology and
Kubernetes orchestration allows for real-time addition or reduction of resources.
This technology is crucial for AI/ML experiments that require high computation.
Additionally, the implemented monitoring platform enables continuous
performance monitoring, facilitating the identification and resolution of potential
issues.
This research successfully demonstrates that the design and implementation of
cloud-native infrastructure using Kubernetes can significantly improve efficiency
and effectiveness in managing AI/ML workloads. This infrastructure not only
supports various computational needs but also provides the flexibility and
scalability required for a dynamic research environment. |
---|