#TITLE_ALTERNATIVE#
<p align="justify"> Heterogeneous computing, in recent years, has been one of the mainstreams in the area of high-performance computing. A heterogeneous computer usually consists of a CPU equipped with one accelerator or more, such as GPU, that is programmed using a programming frame...
Saved in:
Main Author: | |
---|---|
Format: | Dissertations |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/27803 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | <p align="justify"> Heterogeneous computing, in recent years, has been one of the mainstreams in the area of high-performance computing. A heterogeneous computer usually consists of a CPU equipped with one accelerator or more, such as GPU, that is programmed using a programming framework, such as OpenCL. To achieve its best performance in a heterogeneous computer, a workload, in this case an OpenCL kernel, should be mapped to a processor which means the workload should be executed by the right processor. The mapping mechanism should also consider the availability of processor. If the best processor for a workload is not available, then the mapping mechanism should be able to choose an alternative processor, among processors left. The chosen alternative processor should also deliver the best performance compared to the others left. To accommodate the requirements, this research proposes a new method that maps workloads to sequences of processors. A sequence of processors is a list of processors sorted by their performance in executing a workload. Processor selection will use a sequence of processors to select the best processor to execute a workload. <br />
<br />
The workload mapping task is accomplished by k-nearest neighbor (KNN) algorithm. In order to achieve acceptable accuracy, the features to be used with KNN are selected beforehand. The selection of features involves two models, namely filter model, and wrapper model. To evaluate the performance of the proposed method, the two-processors scenario uses k-fold cross-validation and partitioning dataset into training and testing datasets. For the four-processors scenario, the evaluation also uses k-fold cross-validation and validation using additional testing dataset. According to the conducted experiments and evaluations, the best accuracy of the workload mapping method is in the range of 93%-100% for the two-processors scenario, and in the range of 83%-87% for the four-processors scenario. The result of experiments also concludes that the features with the most significant contribution in the two-processors scenario are vector integer operations and vector memory access, whereas for the four-processor scenario are barriers and uncoalesced memory access. <p align="justify"> |
---|