DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM

The previous work by Adyatma (2022) successfully developed a library for per- forming join operations in the Cassandra database. However, there were several challenges that persisted, such as the library performing operations within a single machine and only supporting join operations. This is...

Full description

Saved in:

Bibliographic Details
Main Author:	Anugrah Putra, Widya
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/76529
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:76529
spelling	id-itb.:765292023-08-16T09:58:06ZDEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM Anugrah Putra, Widya Indonesia Final Project join, selection, Cassandra, distributed INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/76529 The previous work by Adyatma (2022) successfully developed a library for per- forming join operations in the Cassandra database. However, there were several challenges that persisted, such as the library performing operations within a single machine and only supporting join operations. This is in contrast to typical scenarios where join operations are combined with other operations like selection. To address these issues, an analysis was conducted on how Cassandra communica- tes the state of its machines within the cluster, how data is stored in a distributed environment, and how data is retrieved in such a setting. After the analysis, da- ta retrieval from Cassandra was improved by utilizing the existing token ranges in Cassandra. Further analysis was performed on various solution alternatives for selecting machi- nes to perform tasks. Options included using dedicated machines, employing load balancers, and utilizing multiple worker machines. The last alternative, utilizing multiple worker machines, was chosen for its efficiency. Subsequently, the selected solutions were implemented in the form of a library. Through testing, it was found that the developed library possessed the intended functionality and exhibited better performance compared to the work by Adyatma (2022). In comparison to Datastax’s Spark Cassandra Connector, the developed library outperformed it on small and medium-sized datasets but underperformed on large datasets. This discrepancy is due to the memory usage not being fully optimized, leading to significant overhead when handling large datasets. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	The previous work by Adyatma (2022) successfully developed a library for per- forming join operations in the Cassandra database. However, there were several challenges that persisted, such as the library performing operations within a single machine and only supporting join operations. This is in contrast to typical scenarios where join operations are combined with other operations like selection. To address these issues, an analysis was conducted on how Cassandra communica- tes the state of its machines within the cluster, how data is stored in a distributed environment, and how data is retrieved in such a setting. After the analysis, da- ta retrieval from Cassandra was improved by utilizing the existing token ranges in Cassandra. Further analysis was performed on various solution alternatives for selecting machi- nes to perform tasks. Options included using dedicated machines, employing load balancers, and utilizing multiple worker machines. The last alternative, utilizing multiple worker machines, was chosen for its efficiency. Subsequently, the selected solutions were implemented in the form of a library. Through testing, it was found that the developed library possessed the intended functionality and exhibited better performance compared to the work by Adyatma (2022). In comparison to Datastax’s Spark Cassandra Connector, the developed library outperformed it on small and medium-sized datasets but underperformed on large datasets. This discrepancy is due to the memory usage not being fully optimized, leading to significant overhead when handling large datasets.
format	Final Project
author	Anugrah Putra, Widya
spellingShingle	Anugrah Putra, Widya DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
author_facet	Anugrah Putra, Widya
author_sort	Anugrah Putra, Widya
title	DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_short	DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_full	DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_fullStr	DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_full_unstemmed	DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM
title_sort	development of join operations for distributed data in cassandra database management system
url	https://digilib.itb.ac.id/gdl/view/76529
_version_	1822008009331572736

DEVELOPMENT OF JOIN OPERATIONS FOR DISTRIBUTED DATA IN CASSANDRA DATABASE MANAGEMENT SYSTEM

Similar Items