In-cache query co-processing on coupled CPU-GPU architectures

Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a...

Full description

Saved in:

Bibliographic Details
Main Authors:	He, Jiong, Zhang, Shuhao, He, Bingsheng
Other Authors:	School of Computer Engineering
Format:	Article
Language:	English
Published:	2016
Subjects:	Memory architecture Query processing
Online Access:	https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-81886
record_format	dspace
spelling	sg-ntu-dr.10356-818862020-05-28T07:17:58Z In-cache query co-processing on coupled CPU-GPU architectures He, Jiong Zhang, Shuhao He, Bingsheng School of Computer Engineering Memory architecture Query processing Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively. MOE (Min. of Education, S’pore) Published version 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2016-01-19T06:56:25Z 2019-12-06T14:42:21Z 2014 Journal Article He, J., Zhang, S., & He, B. (2014). In-cache query co-processing on coupled CPU-GPU architectures. Proceedings of the VLDB Endowment, 8(4), 329-340. doi: 10.14778/2735496.2735497 21508097 https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709 10.14778/2735496.2735497 en Proceedings of the VLDB Endowment © 2014 VLDB Endowment. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/. Obtain permission prior to any use beyond those covered by the license. Contact copyright holder by emailing info@vldb.org. Articles from this volume were invited to present their results at the 41st International Conference on Very Large Data Bases, August 31st - September 4th 2015, Kohala Coast, Hawaii. 12 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	Memory architecture Query processing
spellingShingle	Memory architecture Query processing He, Jiong Zhang, Shuhao He, Bingsheng In-cache query co-processing on coupled CPU-GPU architectures
description	Recently, there have been some emerging processor designs that the CPU and the GPU (Graphics Processing Unit) are integrated in a single chip and share Last Level Cache (LLC). However, the main memory bandwidth of such coupled CPU-GPU architectures can be much lower than that of a discrete GPU. As a result, current GPU query co-processing paradigms can severely suffer from memory stalls. In this paper, we propose a novel in-cache query co-processing paradigm for main memory On-Line Analytical Processing (OLAP) databases on coupled CPU-GPU architectures. Specifically, we adapt CPU-assisted prefetching to minimize cache misses in GPU query co-processing and CPU-assisted decompression to improve query execution performance. Furthermore, we develop a cost model guided adaptation mechanism for distributing the workload of prefetching, decompression, and query execution between CPU and GPU. We implement a system prototype and evaluate it on two recent AMD APUs A8 and A10. The experimental results show that 1) in-cache query co-processing can effectively improve the performance of the state-of-the-art GPU co-processing paradigm by up to 30% and 33% on A8 and A10, respectively, and 2) our workload distribution adaption mechanism can significantly improve the query performance by up to 36% and 40% on A8 and A10, respectively.
author2	School of Computer Engineering
author_facet	School of Computer Engineering He, Jiong Zhang, Shuhao He, Bingsheng
format	Article
author	He, Jiong Zhang, Shuhao He, Bingsheng
author_sort	He, Jiong
title	In-cache query co-processing on coupled CPU-GPU architectures
title_short	In-cache query co-processing on coupled CPU-GPU architectures
title_full	In-cache query co-processing on coupled CPU-GPU architectures
title_fullStr	In-cache query co-processing on coupled CPU-GPU architectures
title_full_unstemmed	In-cache query co-processing on coupled CPU-GPU architectures
title_sort	in-cache query co-processing on coupled cpu-gpu architectures
publishDate	2016
url	https://hdl.handle.net/10356/81886 http://hdl.handle.net/10220/39709
_version_	1681057179435532288

In-cache query co-processing on coupled CPU-GPU architectures

Similar Items