Understanding and profiling a linear algebra kernel on different computing platforms using OpenCL programming model
The trend of using co-processors as accelerators to perform certain tasks is rising in the parallel computing world which emphasizes the advantages of multi-core accelerators to parallelize computations. Heterogeneous computers which run one main program that is divided into multiple work-items ut...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/70508 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The trend of using co-processors as accelerators to perform certain tasks is rising in
the parallel computing world which emphasizes the advantages of multi-core accelerators to parallelize computations. Heterogeneous computers which run one main
program that is divided into multiple work-items utilizes co-processors attached to
them to enhance performance through parallel execution. The performance of kernels which run on these work items vary according to the type of processor. OpenCL
framework simplifi es the use of these accelerators by supporting parallel programming
and providing a cross-platform interface for using the accelerators.
The report initially investigates the performance of OpenCL kernels on multiple computing platforms. The fi rst kernel studied performs matrix multiplication while the
second linear algebra kernels atax and bicg are a part of PolyBench benchmark.
OpenCL programming model is understood thoroughly to profi le different APIs and
calculate execution time. A comparison in GOPS of different accelerator performances
is made.
The latter part of the report focuses on RISC-V ISA which is an open source architecture popular in the industry. It supports simple processors to high computational intensity applications through extensions. A previous implementation, PicoRV32 is
examined to implement a new, clean and extend-able core. The design and implementation of a simple RISC-V processor supporting RV32IM instruction set is made
to develop an accelerator engine with many such cores. |
---|