THE DEVELOPMENT OF LIBRARY FOR JOIN OPERATION IN CASSANDRA DATABASE MANAGEMENT SYSTEM

Cassandra is a column-family NoSQL database management system (DBMS) that stores data by rows, similar to that of a relational database. In relational database, it is common to process data using join operations. On the other hand, Cassandra is not designed for join operations. If there is a need...

Full description

Saved in:
Bibliographic Details
Main Author: Rafi Adyatma, Mohammad
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/69201
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Cassandra is a column-family NoSQL database management system (DBMS) that stores data by rows, similar to that of a relational database. In relational database, it is common to process data using join operations. On the other hand, Cassandra is not designed for join operations. If there is a need to perform operations that require join, the solution will be to denormalize the tables involved. But this approach cannot be done when the database is already operational. A flexible join operator is still required in this case. Thus, this research is aimed at developing a library for join operations in Cassandra. To get the idea on how to perform join operation in Cassandra, we begin with an understanding of how the internal workings of Cassandra works, understanding how to retrieve data from Cassandra, and determining the join algorithm to be developed. There are various join algorithms that can be considered as alternatives and from those variations, the algorithms with best estimated performance are chosen. They are the hybrid hash join and block nested loop join algorithms. Afterwards, a library that implements the algorithms are developed based on some functional requirements and based on designs in the form of analysis models using use case diagrams, class diagrams, sequence diagrams, and so on. The resulting library for join operations supports both inner and outer join operations as well as equi-join and non-equi join operations. Additionally, the technology and technical matters related to the construction of the join library are also described. The testing to the library shows that all join algorithms implemented work effectively. The hybrid hash join algorithm works properly on both small and large amount of data. The nested loop join algorithm, on the other hand, performs poorly especially in handling large amount of data.