Understanding and profiling a linear algebra kernel on different computing platforms using OpenCL programming model

The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel computing world which emphasizes the advantages of multi-core accelerators to parallelize computations. Heterogeneous computers which run one main program that is divided into multiple work-items ut...

Full description

Saved in:
Bibliographic Details
Main Author: Mohanan, Neethu
Other Authors: Douglas Leslie Maskell
Format: Final Year Project
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/70508
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel computing world which emphasizes the advantages of multi-core accelerators to parallelize computations. Heterogeneous computers which run one main program that is divided into multiple work-items utilizes co-processors attached to them to enhance performance through parallel execution. The performance of kernels which run on these work items vary according to the type of processor. OpenCL framework simplifi es the use of these accelerators by supporting parallel programming and providing a cross-platform interface for using the accelerators. The report initially investigates the performance of OpenCL kernels on multiple computing platforms. The fi rst kernel studied performs matrix multiplication while the second linear algebra kernels atax and bicg are a part of PolyBench benchmark. OpenCL programming model is understood thoroughly to profi le different APIs and calculate execution time. A comparison in GOPS of different accelerator performances is made. The latter part of the report focuses on RISC-V ISA which is an open source architecture popular in the industry. It supports simple processors to high computational intensity applications through extensions. A previous implementation, PicoRV32 is examined to implement a new, clean and extend-able core. The design and implementation of a simple RISC-V processor supporting RV32IM instruction set is made to develop an accelerator engine with many such cores.