THE DEVELOPMENT OF LIBRARY FOR JOIN OPERATION IN CASSANDRA DATABASE MANAGEMENT SYSTEM
Cassandra is a column-family NoSQL database management system (DBMS) that stores data by rows, similar to that of a relational database. In relational database, it is common to process data using join operations. On the other hand, Cassandra is not designed for join operations. If there is a need...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/69201 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Cassandra is a column-family NoSQL database management system (DBMS) that stores data by
rows, similar to that of a relational database. In relational database, it is common to process data
using join operations. On the other hand, Cassandra is not designed for join operations. If there is
a need to perform operations that require join, the solution will be to denormalize the tables
involved. But this approach cannot be done when the database is already operational. A flexible
join operator is still required in this case. Thus, this research is aimed at developing a library for
join operations in Cassandra.
To get the idea on how to perform join operation in Cassandra, we begin with an understanding of
how the internal workings of Cassandra works, understanding how to retrieve data from Cassandra,
and determining the join algorithm to be developed. There are various join algorithms that can be
considered as alternatives and from those variations, the algorithms with best estimated
performance are chosen. They are the hybrid hash join and block nested loop join algorithms.
Afterwards, a library that implements the algorithms are developed based on some functional
requirements and based on designs in the form of analysis models using use case diagrams, class
diagrams, sequence diagrams, and so on. The resulting library for join operations supports both
inner and outer join operations as well as equi-join and non-equi join operations. Additionally, the
technology and technical matters related to the construction of the join library are also described.
The testing to the library shows that all join algorithms implemented work effectively. The hybrid
hash join algorithm works properly on both small and large amount of data. The nested loop join
algorithm, on the other hand, performs poorly especially in handling large amount of data. |
---|