In-cache query co-processing on coupled CPU-GPU architectures

Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a...

Full description

Saved in:
Bibliographic Details
Main Authors: He, Jiong, Zhang, Shuhao, He, Bingsheng
Other Authors: School of Computer Engineering
Format: Article
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/81886
http://hdl.handle.net/10220/39709
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-81886
record_format dspace
spelling sg-ntu-dr.10356-818862020-05-28T07:17:58Z In-cache query co-processing on coupled CPU-GPU architectures He, Jiong Zhang, Shuhao He, Bingsheng School of Computer Engineering Memory architecture Query processing Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively. MOE (Min. of Education, S’pore) Published version 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2014 Journal Article He, J., Zhang, S., & He, B. (2014). In-cache query co-processing on coupled CPU-GPU architectures. Proceedings of the VLDB Endowment, 8(4), 329-340. doi: 10.14778/2735496.2735497 21508097 https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709 10.14778/2735496.2735497 en Proceedings of the VLDB Endowment © 2014 VLDB Endowment. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license. Contact copyright holder by emailing info@vldb.org. Articles from this volume were invited to present their results at the 41st International Conference on Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, Hawaii. 12 p. application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Memory architecture
Query processing
spellingShingle Memory architecture
Query processing
He, Jiong
Zhang, Shuhao
He, Bingsheng
In-cache query co-processing on coupled CPU-GPU architectures
description Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively.
author2 School of Computer Engineering
author_facet School of Computer Engineering
He, Jiong
Zhang, Shuhao
He, Bingsheng
format Article
author He, Jiong
Zhang, Shuhao
He, Bingsheng
author_sort He, Jiong
title In-cache query co-processing on coupled CPU-GPU architectures
title_short In-cache query co-processing on coupled CPU-GPU architectures
title_full In-cache query co-processing on coupled CPU-GPU architectures
title_fullStr In-cache query co-processing on coupled CPU-GPU architectures
title_full_unstemmed In-cache query co-processing on coupled CPU-GPU architectures
title_sort in-cache query co-processing on coupled cpu-gpu architectures
publishDate 2016
url https://hdl.handle.net/10356/81886
http://hdl.handle.net/10220/39709
_version_ 1681057179435532288