Optimizing B-tree search performance of big data sets / Mohsen Marjani
Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexin...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2017
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/9744/1/Mohsen_Marjani.pdf http://studentsrepo.um.edu.my/9744/2/Mohsen_Marjani_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/9744/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Universiti Malaya |
Summary: | Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexing is used for data retrieval to improve efficiency and accuracy of the results of queries. However, current indexing techniques have low efficiency and poor real-time performance in an actual query when involving big data. Also, current indexing techniques are not supporting all characteristics of big data and they have weaknesses when they have to index a variety of data along with high velocity and volume. B-tree indexing technique is one of the most popular techniques that is used by many database systems including the one that can handle big datasets. Every time search process is running against indexed data using B-tree technique, the process traverses all left child nodes of a node to find lowers values or traverses the right side child nodes for finding bigger values. Repetition of search tasks for later queries with same or overlap conditions causes repeating same algorithmic traverse and consuming same resources including time and computation power in order to retrieve the result of the search process. This study proposes an optimized B-tree search method to improve the execution time of the search tasks and to optimize the performance of the B-tree search process. In this new method, every node has a new element storing a min-max summarization which helps search process checks availability of the value inside the sub-tree of the node, then start traversing it to find the location of the value. However, during every search task, a history value is added to every traversed node to mark the history of last search operation to be used for next search operation. The results of the experimental analysis show that our new proposed search method decreases the execution time of the search tasks and it improves the search performance several times better than B-tree search performance for same query and same dataset. Moreover, the history value improves the performance of the later queries up to 52%. This research contributes in optimizing data retrieval for big data sets and gives direction to researchers towards a novel approach of indexing and searching big data in order to improve query processing and search performance. |
---|