Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL

The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architectures. Devices with parallel processing architectures such as GPUs, FPGAs and DSPs initially used specifically for dedicated tasks are now gaining popularity as accelerators for more general-purpose co...

Full description

Saved in:

Bibliographic Details
Main Author:	Nandi, Shuvam
Other Authors:	Douglas Leslie Maskell
Format:	Final Year Project
Language:	English
Published:	2017
Subjects:	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures DRNTU::Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
Online Access:	http://hdl.handle.net/10356/70507
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-70507
record_format	dspace
spelling	sg-ntu-dr.10356-705072023-03-03T20:40:18Z Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL Nandi, Shuvam Douglas Leslie Maskell School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures DRNTU::Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architectures. Devices with parallel processing architectures such as GPUs, FPGAs and DSPs initially used specifically for dedicated tasks are now gaining popularity as accelerators for more general-purpose computations. Performance is exploited in these devices by massively parallelising tasks across various compute units. CUDA and OpenCL are two application programming interface (API) models used to program parallel devices. The long-term objective this project seeks to achieve is the design of hypothetical network of multiple processors, capable of running applications in parallel. OpenCL is used to facilitate comparison of performance being a cross-compatible framework across multiple heterogeneous platforms. Initially, this report examines the performance of numerous computing devices. A simple matrix multiplication kernel was executed with different mappings of the kernel onto the devices. This was followed by profiling a complex application recognising handwritten digits from the MNIST database. Performance in terms of GOPS was computed from the execution timings obtained and by analysing the number of computations performed in the application. The second half of this project investigates free ISAs for implementing a processor as the core unit of the hypothetical engine. RISC-V is picked and studied as it provides several extensions to its base integer instruction set, thereby supporting computationally intensive tasks. An existing processor implementation is examined, followed by developing a new implementation based on RV32IM. Bachelor of Engineering (Computer Engineering) 2017-04-26T03:21:20Z 2017-04-26T03:21:20Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/70507 en Nanyang Technological University 87 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures DRNTU::Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures DRNTU::Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems Nandi, Shuvam Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL
description	The decline of Moore’s law has led to a fundamental shift in the design of micro-processor architectures. Devices with parallel processing architectures such as GPUs, FPGAs and DSPs initially used specifically for dedicated tasks are now gaining popularity as accelerators for more general-purpose computations. Performance is exploited in these devices by massively parallelising tasks across various compute units. CUDA and OpenCL are two application programming interface (API) models used to program parallel devices. The long-term objective this project seeks to achieve is the design of hypothetical network of multiple processors, capable of running applications in parallel. OpenCL is used to facilitate comparison of performance being a cross-compatible framework across multiple heterogeneous platforms. Initially, this report examines the performance of numerous computing devices. A simple matrix multiplication kernel was executed with different mappings of the kernel onto the devices. This was followed by profiling a complex application recognising handwritten digits from the MNIST database. Performance in terms of GOPS was computed from the execution timings obtained and by analysing the number of computations performed in the application. The second half of this project investigates free ISAs for implementing a processor as the core unit of the hypothetical engine. RISC-V is picked and studied as it provides several extensions to its base integer instruction set, thereby supporting computationally intensive tasks. An existing processor implementation is examined, followed by developing a new implementation based on RV32IM.
author2	Douglas Leslie Maskell
author_facet	Douglas Leslie Maskell Nandi, Shuvam
format	Final Year Project
author	Nandi, Shuvam
author_sort	Nandi, Shuvam
title	Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL
title_short	Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL
title_full	Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL
title_fullStr	Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL
title_full_unstemmed	Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL
title_sort	understanding and profiling a convolutional neural network application on different computing platforms using opencl
publishDate	2017
url	http://hdl.handle.net/10356/70507
_version_	1759857510808813568

Understanding and profiling a convolutional neural network application on different computing platforms using OpenCL

Similar Items