DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM

The previous work by Adyatma (2022) successfully developed a library for per- forming join operations in the Cassandra database. However, there were several challenges that persisted, such as the library performing operations within a single machine and only supporting join operations. This is...

Full description

Saved in:
Bibliographic Details
Main Author: Anugrah Putra, Widya
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/76529
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:76529
spelling id-itb.:765292023-08-16T09:58:06ZDEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM Anugrah Putra, Widya Indonesia Final Project join, selection, Cassandra, distributed INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76529 The previous work by Adyatma (2022) successfully developed a library for per- forming join operations in the Cassandra database. However, there were several challenges that persisted, such as the library performing operations within a single machine and only supporting join operations. This is in contrast to typical scenarios where join operations are combined with other operations like selection. To address these issues, an analysis was conducted on how Cassandra communica- tes the state of its machines within the cluster, how data is stored in a distributed environment, and how data is retrieved in such a setting. After the analysis, da- ta retrieval from Cassandra was improved by utilizing the existing token ranges in Cassandra. Further analysis was performed on various solution alternatives for selecting machi- nes to perform tasks. Options included using dedicated machines, employing load balancers, and utilizing multiple worker machines. The last alternative, utilizing multiple worker machines, was chosen for its efficiency. Subsequently, the selected solutions were implemented in the form of a library. Through testing, it was found that the developed library possessed the intended functionality and exhibited better performance compared to the work by Adyatma (2022). In comparison to Datastax’s Spark Cassandra Connector, the developed library outperformed it on small and medium-sized datasets but underperformed on large datasets. This discrepancy is due to the memory usage not being fully optimized, leading to significant overhead when handling large datasets. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description The previous work by Adyatma (2022) successfully developed a library for per- forming join operations in the Cassandra database. However, there were several challenges that persisted, such as the library performing operations within a single machine and only supporting join operations. This is in contrast to typical scenarios where join operations are combined with other operations like selection. To address these issues, an analysis was conducted on how Cassandra communica- tes the state of its machines within the cluster, how data is stored in a distributed environment, and how data is retrieved in such a setting. After the analysis, da- ta retrieval from Cassandra was improved by utilizing the existing token ranges in Cassandra. Further analysis was performed on various solution alternatives for selecting machi- nes to perform tasks. Options included using dedicated machines, employing load balancers, and utilizing multiple worker machines. The last alternative, utilizing multiple worker machines, was chosen for its efficiency. Subsequently, the selected solutions were implemented in the form of a library. Through testing, it was found that the developed library possessed the intended functionality and exhibited better performance compared to the work by Adyatma (2022). In comparison to Datastax’s Spark Cassandra Connector, the developed library outperformed it on small and medium-sized datasets but underperformed on large datasets. This discrepancy is due to the memory usage not being fully optimized, leading to significant overhead when handling large datasets.
format Final Project
author Anugrah Putra, Widya
spellingShingle Anugrah Putra, Widya
DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
author_facet Anugrah Putra, Widya
author_sort Anugrah Putra, Widya
title DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_short DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_full DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_fullStr DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_full_unstemmed DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_sort development of join operations for distributed data in cassandra database management system
url https://digilib.itb.ac.id/gdl/view/76529
_version_ 1822008009331572736