Neural network implementation on a graphics processing unit using CUDA
The Graphics Processing Unit (GPU) parallel architecture is now being used not just for graphics but also for general purpose computations. This emerging field of general-purpose computation on graphics hardware is referred to as General Purpose-computing on a Graphics Processing Unit (GPGPU). Neura...
Saved in:
Main Authors: | , , , |
---|---|
Format: | text |
Language: | English |
Published: |
Animo Repository
2010
|
Subjects: | |
Online Access: | https://animorepository.dlsu.edu.ph/etd_bachelors/11504 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | De La Salle University |
Language: | English |
Summary: | The Graphics Processing Unit (GPU) parallel architecture is now being used not just for graphics but also for general purpose computations. This emerging field of general-purpose computation on graphics hardware is referred to as General Purpose-computing on a Graphics Processing Unit (GPGPU).
Neural networks are highly parallel algorithms which may be implemented in the GPU. Previous research works used shading languages that are used mainly for graphics computations. NVIDIA, a leading GPU vendor developed a technology called Compute Unified Device Architecture (CUDA) which is appropriate for GPGPU programming.
In this research, the proponents have developed a GPU-based implementation of Kohonen's Self-Organizing Map using CUDA. The proponents used Animal SOM data and Music Classification data to test the network and used the GPUs, NVIDIA GeForce 9400M and NVIDIA GeFore 9800GT for the testing. The minimum speedup using the Animal SOM data on the NVIDIA GeForce 9400M was 1.35 using 64 x 64 network and maximum speedup of 3.11 using 256x256 network. Using the Music Classification data, the minimum speedup on the NVIDIA GeForce 9400m was 1.17 using 64x64 network and maximum speed of 1.64 using 512x512 network. The minimum speedup on the NVIDIA GeForce 9800GT was 1.32 using 32x32 network and maximum speedup of 2.72 using 256x256 network. |
---|