In-cache query co-processing on coupled CPU-GPU architectures
Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a...
Saved in:
Main Authors: | , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-81886 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-818862020-05-28T07:17:58Z In-cache query co-processing on coupled CPU-GPU architectures He, Jiong Zhang, Shuhao He, Bingsheng School of Computer Engineering Memory architecture Query processing Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively. MOE (Min. of Education, S’pore) Published version 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2014 Journal Article He, J., Zhang, S., & He, B. (2014). In-cache query co-processing on coupled CPU-GPU architectures. Proceedings of the VLDB Endowment, 8(4), 329-340. doi: 10.14778/2735496.2735497 21508097 https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709 10.14778/2735496.2735497 en Proceedings of the VLDB Endowment © 2014 VLDB Endowment. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license. Contact copyright holder by emailing info@vldb.org. Articles from this volume were invited to present their results at the 41st International Conference on Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, Hawaii. 12 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Memory architecture Query processing |
spellingShingle |
Memory architecture Query processing He, Jiong Zhang, Shuhao He, Bingsheng In-cache query co-processing on coupled CPU-GPU architectures |
description |
Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering He, Jiong Zhang, Shuhao He, Bingsheng |
format |
Article |
author |
He, Jiong Zhang, Shuhao He, Bingsheng |
author_sort |
He, Jiong |
title |
In-cache query co-processing on coupled CPU-GPU architectures |
title_short |
In-cache query co-processing on coupled CPU-GPU architectures |
title_full |
In-cache query co-processing on coupled CPU-GPU architectures |
title_fullStr |
In-cache query co-processing on coupled CPU-GPU architectures |
title_full_unstemmed |
In-cache query co-processing on coupled CPU-GPU architectures |
title_sort |
in-cache query co-processing on coupled cpu-gpu architectures |
publishDate |
2016 |
url |
https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709 |
_version_ |
1681057179435532288 |