Optimizing B-tree search performance of big data sets / Mohsen Marjani

Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexin...

Full description

Saved in:

Bibliographic Details
Main Author:	Mohsen , Marjani
Format:	Thesis
Published:	2017
Subjects:	QA75 Electronic computers. Computer science
Online Access:	http://studentsrepo.um.edu.my/9744/1/Mohsen_Marjani.pdf http://studentsrepo.um.edu.my/9744/2/Mohsen_Marjani_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/9744/
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Malaya

id	my.um.stud.9744
record_format	eprints
spelling	my.um.stud.97442020-06-21T18:09:38Z Optimizing B-tree search performance of big data sets / Mohsen Marjani Mohsen , Marjani QA75 Electronic computers. Computer science Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexing is used for data retrieval to improve efficiency and accuracy of the results of queries. However, current indexing techniques have low efficiency and poor real-time performance in an actual query when involving big data. Also, current indexing techniques are not supporting all characteristics of big data and they have weaknesses when they have to index a variety of data along with high velocity and volume. B-tree indexing technique is one of the most popular techniques that is used by many database systems including the one that can handle big datasets. Every time search process is running against indexed data using B-tree technique, the process traverses all left child nodes of a node to find lowers values or traverses the right side child nodes for finding bigger values. Repetition of search tasks for later queries with same or overlap conditions causes repeating same algorithmic traverse and consuming same resources including time and computation power in order to retrieve the result of the search process. This study proposes an optimized B-tree search method to improve the execution time of the search tasks and to optimize the performance of the B-tree search process. In this new method, every node has a new element storing a min-max summarization which helps search process checks availability of the value inside the sub-tree of the node, then start traversing it to find the location of the value. However, during every search task, a history value is added to every traversed node to mark the history of last search operation to be used for next search operation. The results of the experimental analysis show that our new proposed search method decreases the execution time of the search tasks and it improves the search performance several times better than B-tree search performance for same query and same dataset. Moreover, the history value improves the performance of the later queries up to 52%. This research contributes in optimizing data retrieval for big data sets and gives direction to researchers towards a novel approach of indexing and searching big data in order to improve query processing and search performance. 2017-06 Thesis NonPeerReviewed application/pdf http://studentsrepo.um.edu.my/9744/1/Mohsen_Marjani.pdf application/pdf http://studentsrepo.um.edu.my/9744/2/Mohsen_Marjani_%E2%80%93_Thesis.pdf Mohsen , Marjani (2017) Optimizing B-tree search performance of big data sets / Mohsen Marjani. PhD thesis, University of Malaya. http://studentsrepo.um.edu.my/9744/
institution	Universiti Malaya
building	UM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaya
content_source	UM Student Repository
url_provider	http://studentsrepo.um.edu.my/
topic	QA75 Electronic computers. Computer science
spellingShingle	QA75 Electronic computers. Computer science Mohsen , Marjani Optimizing B-tree search performance of big data sets / Mohsen Marjani
description	Many applications continuously produce large amounts of various data every day, which exceeds the limit of conventional data storage tools. Such data typically includes a large amount of data with different formats that becomes very difficult to query using traditional indexing technologies. Indexing is used for data retrieval to improve efficiency and accuracy of the results of queries. However, current indexing techniques have low efficiency and poor real-time performance in an actual query when involving big data. Also, current indexing techniques are not supporting all characteristics of big data and they have weaknesses when they have to index a variety of data along with high velocity and volume. B-tree indexing technique is one of the most popular techniques that is used by many database systems including the one that can handle big datasets. Every time search process is running against indexed data using B-tree technique, the process traverses all left child nodes of a node to find lowers values or traverses the right side child nodes for finding bigger values. Repetition of search tasks for later queries with same or overlap conditions causes repeating same algorithmic traverse and consuming same resources including time and computation power in order to retrieve the result of the search process. This study proposes an optimized B-tree search method to improve the execution time of the search tasks and to optimize the performance of the B-tree search process. In this new method, every node has a new element storing a min-max summarization which helps search process checks availability of the value inside the sub-tree of the node, then start traversing it to find the location of the value. However, during every search task, a history value is added to every traversed node to mark the history of last search operation to be used for next search operation. The results of the experimental analysis show that our new proposed search method decreases the execution time of the search tasks and it improves the search performance several times better than B-tree search performance for same query and same dataset. Moreover, the history value improves the performance of the later queries up to 52%. This research contributes in optimizing data retrieval for big data sets and gives direction to researchers towards a novel approach of indexing and searching big data in order to improve query processing and search performance.
format	Thesis
author	Mohsen , Marjani
author_facet	Mohsen , Marjani
author_sort	Mohsen , Marjani
title	Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_short	Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_full	Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_fullStr	Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_full_unstemmed	Optimizing B-tree search performance of big data sets / Mohsen Marjani
title_sort	optimizing b-tree search performance of big data sets / mohsen marjani
publishDate	2017
url	http://studentsrepo.um.edu.my/9744/1/Mohsen_Marjani.pdf http://studentsrepo.um.edu.my/9744/2/Mohsen_Marjani_%E2%80%93_Thesis.pdf http://studentsrepo.um.edu.my/9744/
_version_	1738506295420387328

Optimizing B-tree search performance of big data sets / Mohsen Marjani

Similar Items