A high-performance main-memory query engine on emerging many-core processors
Population ageing is an increasingly global phenomenon. The population of the elderly people is growing faster than all young age groups. One major impact of this global ageing issue is the elderly people's increasing demand for medical, social and economical cares. However, due to the declinin...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/74182 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-74182 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Cheng, Xuntao A high-performance main-memory query engine on emerging many-core processors |
description |
Population ageing is an increasingly global phenomenon. The population of the elderly people is growing faster than all young age groups. One major impact of this global ageing issue is the elderly people's increasing demand for medical, social and economical cares. However, due to the declining of the old-age support ratio, it is increasingly challenging to meet this demand due to the lack of working-age young people. We believe this issue can be addressed taking advantage of the state-of-the-art information technologies (IT). There have been many IT systems proposed to assist the life of the elderly people, such as the Ambient Assisted Living (AAL) systems. They are termed as ageless computing technologies. We recognize that a key component of ageless computing systems is the storage, processing, and management of data collected from the elderly people, based on which customized cares can be provided. For example, health care providers can provide more fine-grained and accurate services to the elderly if they are enabled to track important bio-metrics and analyze them in a timely manner for each elderly individual. In this thesis, we mainly focus on a query engine system that can harness the state-of-the-art computer architectures to assist other data-driven ageless computing technologies with a high performance.
We focus on main-memory query engines in this thesis because query response time is critical in our application of databases in ageless computing. A main-memory database stores its data in the main memory where processors can retrieve data at a much higher bandwidth compared with disk-based databases. Furthermore, new types of memories such as die-stacked DRAMs further increase the memory bandwidth that main-memory systems can harness. Meanwhile, emerging architectures of processors also bring important opportunities for performance improvements of main-memory databases. For example, the massive thread-level parallelism enabled by many-core architectures can help improving the bandwidth utilization and computation speeds in main-memory databases. These emerging technologies have been driving new designs, implementations, and optimizations of main-memory database algorithms in recent years.
In this thesis, we make several contributions. Firstly, we revisit the state-of-the-art hash join algorithms and software optimizations experimentally on the many-core processor with die-stacked HBMs. We find that although many findings on database algorithms from existing studies are still valid on the latest many-core architecture, there are major performance issues that existing optimizations are insufficient to address. And, there are still significant rooms for further performance improvements. For example, the many-core architecture with new types of memories forms a new NUMA architecture that invalidates some of the state-of-the-art NUMA-aware optimizations.
Secondly, based on the findings derived from the first study, we propose a novel deployment algorithm for hash tables, which are important data structures in main-memory databases, on the many-core architecture with die-stacked HBMs. Our proposed algorithm exploits both the die-stacked HBMs and the main memory in parallel and minimizes workload imbalance during the runtime by placing both the hash table and threads accessing it carefully. We apply the proposed algorithm on both simple hash joins and partitioned hash joins, where we have achieved about three times and 20\% performance improvements over the state-of-the-art implementations, respectively.
Thirdly, we propose a fine-grained query scheduling approach which decomposes database operators into fine-grained phases with characteristic requirements for hardware resources and executes them concurrently in order to improve the overall utilization of all resources. This study results in a main-memory query engine, PhiDB, for Online Analytical Processing (OLAP). PhiDB has achieved 1.18x to 3.24x speedups over baseline approaches. We also demonstrate that PhiDB improves the query response time while processing a dataset containing medical data of the elderly people. |
author2 |
Boon Chirn Chye |
author_facet |
Boon Chirn Chye Cheng, Xuntao |
format |
Thesis-Doctor of Philosophy |
author |
Cheng, Xuntao |
author_sort |
Cheng, Xuntao |
title |
A high-performance main-memory query engine on emerging many-core processors |
title_short |
A high-performance main-memory query engine on emerging many-core processors |
title_full |
A high-performance main-memory query engine on emerging many-core processors |
title_fullStr |
A high-performance main-memory query engine on emerging many-core processors |
title_full_unstemmed |
A high-performance main-memory query engine on emerging many-core processors |
title_sort |
high-performance main-memory query engine on emerging many-core processors |
publisher |
Nanyang Technological University |
publishDate |
2018 |
url |
http://hdl.handle.net/10356/74182 |
_version_ |
1683494462336008192 |
spelling |
sg-ntu-dr.10356-741822020-11-01T05:01:57Z A high-performance main-memory query engine on emerging many-core processors Cheng, Xuntao Boon Chirn Chye Lau Chiew Tong He Bingsheng DRNTU::Engineering::Computer science and engineering Population ageing is an increasingly global phenomenon. The population of the elderly people is growing faster than all young age groups. One major impact of this global ageing issue is the elderly people's increasing demand for medical, social and economical cares. However, due to the declining of the old-age support ratio, it is increasingly challenging to meet this demand due to the lack of working-age young people. We believe this issue can be addressed taking advantage of the state-of-the-art information technologies (IT). There have been many IT systems proposed to assist the life of the elderly people, such as the Ambient Assisted Living (AAL) systems. They are termed as ageless computing technologies. We recognize that a key component of ageless computing systems is the storage, processing, and management of data collected from the elderly people, based on which customized cares can be provided. For example, health care providers can provide more fine-grained and accurate services to the elderly if they are enabled to track important bio-metrics and analyze them in a timely manner for each elderly individual. In this thesis, we mainly focus on a query engine system that can harness the state-of-the-art computer architectures to assist other data-driven ageless computing technologies with a high performance. We focus on main-memory query engines in this thesis because query response time is critical in our application of databases in ageless computing. A main-memory database stores its data in the main memory where processors can retrieve data at a much higher bandwidth compared with disk-based databases. Furthermore, new types of memories such as die-stacked DRAMs further increase the memory bandwidth that main-memory systems can harness. Meanwhile, emerging architectures of processors also bring important opportunities for performance improvements of main-memory databases. For example, the massive thread-level parallelism enabled by many-core architectures can help improving the bandwidth utilization and computation speeds in main-memory databases. These emerging technologies have been driving new designs, implementations, and optimizations of main-memory database algorithms in recent years. In this thesis, we make several contributions. Firstly, we revisit the state-of-the-art hash join algorithms and software optimizations experimentally on the many-core processor with die-stacked HBMs. We find that although many findings on database algorithms from existing studies are still valid on the latest many-core architecture, there are major performance issues that existing optimizations are insufficient to address. And, there are still significant rooms for further performance improvements. For example, the many-core architecture with new types of memories forms a new NUMA architecture that invalidates some of the state-of-the-art NUMA-aware optimizations. Secondly, based on the findings derived from the first study, we propose a novel deployment algorithm for hash tables, which are important data structures in main-memory databases, on the many-core architecture with die-stacked HBMs. Our proposed algorithm exploits both the die-stacked HBMs and the main memory in parallel and minimizes workload imbalance during the runtime by placing both the hash table and threads accessing it carefully. We apply the proposed algorithm on both simple hash joins and partitioned hash joins, where we have achieved about three times and 20\% performance improvements over the state-of-the-art implementations, respectively. Thirdly, we propose a fine-grained query scheduling approach which decomposes database operators into fine-grained phases with characteristic requirements for hardware resources and executes them concurrently in order to improve the overall utilization of all resources. This study results in a main-memory query engine, PhiDB, for Online Analytical Processing (OLAP). PhiDB has achieved 1.18x to 3.24x speedups over baseline approaches. We also demonstrate that PhiDB improves the query response time while processing a dataset containing medical data of the elderly people. Doctor of Philosophy (IGS) 2018-05-04T06:12:19Z 2018-05-04T06:12:19Z 2018 Thesis-Doctor of Philosophy Cheng, X. (2018). A high-performance main-memory query engine on emerging many-core processors. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/74182 10.32657/10356/74182 en 134 p. application/pdf Nanyang Technological University |