Multiple kernels optimization in GPU

This project is developed in the NVIDIA CUDA C/C++ environment which is provided. All the equipment and software stacks are provided by Parallel and Distributed Computing Center. The objective of this project is to find a way to split the GPU kernel and perform GPU kernel scheduling based on the imp...

Full description

Saved in:
Bibliographic Details
Main Author: Sun, Yanan.
Other Authors: Zhang Xiaohong
Format: Final Year Project
Language:English
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/10356/49158
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:This project is developed in the NVIDIA CUDA C/C++ environment which is provided. All the equipment and software stacks are provided by Parallel and Distributed Computing Center. The objective of this project is to find a way to split the GPU kernel and perform GPU kernel scheduling based on the importance of GPU kernels. General purpose computing GPU (GPGPU) is burgeoning technique to enhance the computation of parallel programs. However, the non-preemptive nature of GPU kernel possesses great challenges for applying GPU computing to real-time application. In this project, we proposed a solution to split GPU kernels from the host code and also developed a model to schedule the GPU kernels with the kernel splitting techniques proposed. In the first part of this project, we focus on how to split GPU kernels and make GPU kernels preemptive. Our design is to split GPU kernel into smaller size of sub kernels by reducing the grid of blocks of threads to execute it. Then this kernel is executed multiple times to have same functionality as the original kernel. In order to this, kernel invocation is introduced additional parameters by modifying the PTX code. The second part of this project is to develop a model that can perform GPU kernel scheduling based on the priority assigned with the kernel splitting techniques developed in the first part. Our design is a server-clients model. In this model, clients initiate the kernel invocation, and pass control to server, then the server handle the actual invocation of GPU kernels. Server takes the overcharge the execution of all the kernels so it can schedule them based on priority.