A high-performance main-memory query engine on emerging many-core processors

Population ageing is an increasingly global phenomenon. The population of the elderly people is growing faster than all young age groups. One major impact of this global ageing issue is the elderly people's increasing demand for medical, social and economical cares. However, due to the declinin...

Full description

Saved in:
Bibliographic Details
Main Author: Cheng, Xuntao
Other Authors: Boon Chirn Chye
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2018
Subjects:
Online Access:http://hdl.handle.net/10356/74182
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-74182
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering
spellingShingle DRNTU::Engineering::Computer science and engineering
Cheng, Xuntao
A high-performance main-memory query engine on emerging many-core processors
description Population ageing is an increasingly global phenomenon. The population of the elderly people is growing faster than all young age groups. One major impact of this global ageing issue is the elderly people's increasing demand for medical, social and economical cares. However, due to the declining of the old-age support ratio, it is increasingly challenging to meet this demand due to the lack of working-age young people. We believe this issue can be addressed taking advantage of the state-of-the-art information technologies (IT). There have been many IT systems proposed to assist the life of the elderly people, such as the Ambient Assisted Living (AAL) systems. They are termed as ageless computing technologies. We recognize that a key component of ageless computing systems is the storage, processing, and management of data collected from the elderly people, based on which customized cares can be provided. For example, health care providers can provide more fine-grained and accurate services to the elderly if they are enabled to track important bio-metrics and analyze them in a timely manner for each elderly individual. In this thesis, we mainly focus on a query engine system that can harness the state-of-the-art computer architectures to assist other data-driven ageless computing technologies with a high performance. We focus on main-memory query engines in this thesis because query response time is critical in our application of databases in ageless computing. A main-memory database stores its data in the main memory where processors can retrieve data at a much higher bandwidth compared with disk-based databases. Furthermore, new types of memories such as die-stacked DRAMs further increase the memory bandwidth that main-memory systems can harness. Meanwhile, emerging architectures of processors also bring important opportunities for performance improvements of main-memory databases. For example, the massive thread-level parallelism enabled by many-core architectures can help improving the bandwidth utilization and computation speeds in main-memory databases. These emerging technologies have been driving new designs, implementations, and optimizations of main-memory database algorithms in recent years. In this thesis, we make several contributions. Firstly, we revisit the state-of-the-art hash join algorithms and software optimizations experimentally on the many-core processor with die-stacked HBMs. We find that although many findings on database algorithms from existing studies are still valid on the latest many-core architecture, there are major performance issues that existing optimizations are insufficient to address. And, there are still significant rooms for further performance improvements. For example, the many-core architecture with new types of memories forms a new NUMA architecture that invalidates some of the state-of-the-art NUMA-aware optimizations. Secondly, based on the findings derived from the first study, we propose a novel deployment algorithm for hash tables, which are important data structures in main-memory databases, on the many-core architecture with die-stacked HBMs. Our proposed algorithm exploits both the die-stacked HBMs and the main memory in parallel and minimizes workload imbalance during the runtime by placing both the hash table and threads accessing it carefully. We apply the proposed algorithm on both simple hash joins and partitioned hash joins, where we have achieved about three times and 20\% performance improvements over the state-of-the-art implementations, respectively. Thirdly, we propose a fine-grained query scheduling approach which decomposes database operators into fine-grained phases with characteristic requirements for hardware resources and executes them concurrently in order to improve the overall utilization of all resources. This study results in a main-memory query engine, PhiDB, for Online Analytical Processing (OLAP). PhiDB has achieved 1.18x to 3.24x speedups over baseline approaches. We also demonstrate that PhiDB improves the query response time while processing a dataset containing medical data of the elderly people.
author2 Boon Chirn Chye
author_facet Boon Chirn Chye
Cheng, Xuntao
format Thesis-Doctor of Philosophy
author Cheng, Xuntao
author_sort Cheng, Xuntao
title A high-performance main-memory query engine on emerging many-core processors
title_short A high-performance main-memory query engine on emerging many-core processors
title_full A high-performance main-memory query engine on emerging many-core processors
title_fullStr A high-performance main-memory query engine on emerging many-core processors
title_full_unstemmed A high-performance main-memory query engine on emerging many-core processors
title_sort high-performance main-memory query engine on emerging many-core processors
publisher Nanyang Technological University
publishDate 2018
url http://hdl.handle.net/10356/74182
_version_ 1683494462336008192
spelling sg-ntu-dr.10356-741822020-11-01T05:01:57Z A high-performance main-memory query engine on emerging many-core processors Cheng, Xuntao Boon Chirn Chye Lau Chiew Tong He Bingsheng DRNTU::Engineering::Computer science and engineering Population ageing is an increasingly global phenomenon. The population of the elderly people is growing faster than all young age groups. One major impact of this global ageing issue is the elderly people's increasing demand for medical, social and economical cares. However, due to the declining of the old-age support ratio, it is increasingly challenging to meet this demand due to the lack of working-age young people. We believe this issue can be addressed taking advantage of the state-of-the-art information technologies (IT). There have been many IT systems proposed to assist the life of the elderly people, such as the Ambient Assisted Living (AAL) systems. They are termed as ageless computing technologies. We recognize that a key component of ageless computing systems is the storage, processing, and management of data collected from the elderly people, based on which customized cares can be provided. For example, health care providers can provide more fine-grained and accurate services to the elderly if they are enabled to track important bio-metrics and analyze them in a timely manner for each elderly individual. In this thesis, we mainly focus on a query engine system that can harness the state-of-the-art computer architectures to assist other data-driven ageless computing technologies with a high performance. We focus on main-memory query engines in this thesis because query response time is critical in our application of databases in ageless computing. A main-memory database stores its data in the main memory where processors can retrieve data at a much higher bandwidth compared with disk-based databases. Furthermore, new types of memories such as die-stacked DRAMs further increase the memory bandwidth that main-memory systems can harness. Meanwhile, emerging architectures of processors also bring important opportunities for performance improvements of main-memory databases. For example, the massive thread-level parallelism enabled by many-core architectures can help improving the bandwidth utilization and computation speeds in main-memory databases. These emerging technologies have been driving new designs, implementations, and optimizations of main-memory database algorithms in recent years. In this thesis, we make several contributions. Firstly, we revisit the state-of-the-art hash join algorithms and software optimizations experimentally on the many-core processor with die-stacked HBMs. We find that although many findings on database algorithms from existing studies are still valid on the latest many-core architecture, there are major performance issues that existing optimizations are insufficient to address. And, there are still significant rooms for further performance improvements. For example, the many-core architecture with new types of memories forms a new NUMA architecture that invalidates some of the state-of-the-art NUMA-aware optimizations. Secondly, based on the findings derived from the first study, we propose a novel deployment algorithm for hash tables, which are important data structures in main-memory databases, on the many-core architecture with die-stacked HBMs. Our proposed algorithm exploits both the die-stacked HBMs and the main memory in parallel and minimizes workload imbalance during the runtime by placing both the hash table and threads accessing it carefully. We apply the proposed algorithm on both simple hash joins and partitioned hash joins, where we have achieved about three times and 20\% performance improvements over the state-of-the-art implementations, respectively. Thirdly, we propose a fine-grained query scheduling approach which decomposes database operators into fine-grained phases with characteristic requirements for hardware resources and executes them concurrently in order to improve the overall utilization of all resources. This study results in a main-memory query engine, PhiDB, for Online Analytical Processing (OLAP). PhiDB has achieved 1.18x to 3.24x speedups over baseline approaches. We also demonstrate that PhiDB improves the query response time while processing a dataset containing medical data of the elderly people. Doctor of Philosophy (IGS) 2018-05-04T06:12:19Z 2018-05-04T06:12:19Z 2018 Thesis-Doctor of Philosophy Cheng, X. (2018). A high-performance main-memory query engine on emerging many-core processors. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/74182 10.32657/10356/74182 en 134 p. application/pdf Nanyang Technological University