LOAD BALANCING ON DATABASE SHARDING IN MONGODB USING MACHINE LEARNING

The default load balancing mechanism in MongoDB is done only to balance the shard’s chunks’ count, so bottleneck condition might happens. Heat-based load balancing mechanism is an improvement meant to remedy that, but bottleneck can still happens due to this mechanism oversensitivity to overlo...

Full description

Saved in:
Bibliographic Details
Main Author: ILMI - NIM : 13512048 , MUNTAHA
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/29400
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The default load balancing mechanism in MongoDB is done only to balance the shard’s chunks’ count, so bottleneck condition might happens. Heat-based load balancing mechanism is an improvement meant to remedy that, but bottleneck can still happens due to this mechanism oversensitivity to overload condition. When the utility spikes irregularly, the mechanism’s exception detection will triggers data migration due to overload multiple times. This will leads to not only wasteful, but excessive data migrations that put serious strains on the database system. Machine learning will be used to replace that flawed exception detection. <br /> <br /> <br /> <br /> <br /> <br /> Machine learning is used to predict the shard’s near future condition, whether it’ll becomes overloaded, underloaded, or just running as normal. Features that might be used for the machine learning model are CPU’s, memory’s, and bandwidth’s utilization, those that directly decides a shard’s condition, and also number of request taken by that particular shard, separated by the request’s type. All of that feature vector will be processed as time-series data. RNN and LSTM are used as the methods. <br /> <br /> <br /> <br /> <br /> <br /> Any and all features combinations is trained and tested in the experiment, using training set and test set obtained from benchmarking the database system. Based on the experiment’ result, read request’s count and update request’s coun are not relevant and should be discarded. It can also be concluded that the features combination of utilization of bandwidth and insert request’s count, using LSTM methods, results in the best model, which has the best F-Measure for overload class and accuracy. That model is suitable to replace the default exception detection with a significant increase of 5% in accuracy. <br />