Block-based neural network mapping on graphics processor unit

Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed du...

Full description

Saved in:
Bibliographic Details
Main Author: Ong, Chin Tong
Format: Thesis
Language:English
Published: 2015
Subjects:
Online Access:http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf
http://eprints.utm.my/id/eprint/53959/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Universiti Teknologi Malaysia
Language: English
id my.utm.53959
record_format eprints
spelling my.utm.539592020-10-08T04:38:37Z http://eprints.utm.my/id/eprint/53959/ Block-based neural network mapping on graphics processor unit Ong, Chin Tong TK Electrical engineering. Electronics Nuclear engineering Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed due to low performance of Nios II software used for communication between central processing unit (CPU) and FPGA. This project aims to improve training speed of multithread BbNN block by mapping BbNN model into Compute Unified Device Architecture (CUDA) core. In this project, each BbNN block is mapped into a CUDA core with each core running on a single thread. The functional verification of BbNN core is carried out based on the BbNN output accuracy value. Near 100 percent accuracy value obtained is used to verify the CUDA mapped BbNN. The performance trade-off analysis had been carried out by comparing the accuracy value obtained from BbNN evolution on GPU versus CPU implementations. From the results obtained, it is found out that the performance of CUDA-mapped BbNN can only be as fast as CPU-mapped implementation. Although CUDA-mapped BbNN implementation run multiple BbNN blocks training in parallel, large data transfer between CPU and GPU dominates the performance gain in training multiple BbNN blocks in parallel. Besides that, a significant gain in training speed can only be seen if the order of complexity for GPU execution is at a higher order compared to the order of CPU-GPU data transfer. The result obtained in this project provides recommendation for future research works on how to further improve the training speed of CUDA-base BbNN implementation. 2015-06 Thesis NonPeerReviewed application/pdf en http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf Ong, Chin Tong (2015) Block-based neural network mapping on graphics processor unit. Masters thesis, Universiti Teknologi Malaysia, Faculty of Electrical Engineering. http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538
institution Universiti Teknologi Malaysia
building UTM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Teknologi Malaysia
content_source UTM Institutional Repository
url_provider http://eprints.utm.my/
language English
topic TK Electrical engineering. Electronics Nuclear engineering
spellingShingle TK Electrical engineering. Electronics Nuclear engineering
Ong, Chin Tong
Block-based neural network mapping on graphics processor unit
description Block-based neural network (BbNN) was introduced to improve the training speed of artificial neural network. Various works had been carried out by previous researchers to improve training speed of BbNN system. Multithread BbNN training on field-programmable gate array (FPGA) limits training speed due to low performance of Nios II software used for communication between central processing unit (CPU) and FPGA. This project aims to improve training speed of multithread BbNN block by mapping BbNN model into Compute Unified Device Architecture (CUDA) core. In this project, each BbNN block is mapped into a CUDA core with each core running on a single thread. The functional verification of BbNN core is carried out based on the BbNN output accuracy value. Near 100 percent accuracy value obtained is used to verify the CUDA mapped BbNN. The performance trade-off analysis had been carried out by comparing the accuracy value obtained from BbNN evolution on GPU versus CPU implementations. From the results obtained, it is found out that the performance of CUDA-mapped BbNN can only be as fast as CPU-mapped implementation. Although CUDA-mapped BbNN implementation run multiple BbNN blocks training in parallel, large data transfer between CPU and GPU dominates the performance gain in training multiple BbNN blocks in parallel. Besides that, a significant gain in training speed can only be seen if the order of complexity for GPU execution is at a higher order compared to the order of CPU-GPU data transfer. The result obtained in this project provides recommendation for future research works on how to further improve the training speed of CUDA-base BbNN implementation.
format Thesis
author Ong, Chin Tong
author_facet Ong, Chin Tong
author_sort Ong, Chin Tong
title Block-based neural network mapping on graphics processor unit
title_short Block-based neural network mapping on graphics processor unit
title_full Block-based neural network mapping on graphics processor unit
title_fullStr Block-based neural network mapping on graphics processor unit
title_full_unstemmed Block-based neural network mapping on graphics processor unit
title_sort block-based neural network mapping on graphics processor unit
publishDate 2015
url http://eprints.utm.my/id/eprint/53959/1/OngChinTongMFKE2015.pdf
http://eprints.utm.my/id/eprint/53959/
http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:86538
_version_ 1681489452065619968