A study of deep learning on many-core processors
Deep learning becomes a hot topic recently in various areas, from industry to academia. More and more applications developed attract a lot of public attention. Some problems in deep learning are still challenging research topics. With big data, training time is one of the major concerns to design...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/66954 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-66954 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-669542023-03-03T20:51:16Z A study of deep learning on many-core processors Chen, Peng He Bingsheng School of Computer Engineering DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Deep learning becomes a hot topic recently in various areas, from industry to academia. More and more applications developed attract a lot of public attention. Some problems in deep learning are still challenging research topics. With big data, training time is one of the major concerns to design a deep network. Parallel processing seems to be a solution to reduce training time tremendously. This project aims to investigate a few strategies to train a deep network in parallel on Apache Singa platform. The strategies consist training in synchronous mode, in asynchronous mode and in GPU. Although many factors determine the quality of the network design, in this project, training time is a major concern. Training time of a deep network can be reduced to a very large extends by using GPU training, and achieve a small speedup by multi-process training on a single machine. A complete analysis may be done by also considering scalability factor and measure the performance in a cluster. Bachelor of Engineering (Computer Science) 2016-05-06T07:04:13Z 2016-05-06T07:04:13Z 2016 Final Year Project (FYP) http://hdl.handle.net/10356/66954 en Nanyang Technological University 53 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition |
spellingShingle |
DRNTU::Engineering::Computer science and engineering::Computer systems organization::Performance of systems DRNTU::Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence DRNTU::Engineering::Computer science and engineering::Computing methodologies::Pattern recognition Chen, Peng A study of deep learning on many-core processors |
description |
Deep learning becomes a hot topic recently in various areas, from
industry to academia. More and more applications developed attract a lot of public attention. Some problems in deep learning are still challenging research topics. With big data, training time is one of the major concerns to design a deep network. Parallel processing seems to be a solution to reduce training time tremendously.
This project aims to investigate a few strategies to train a deep network in parallel on Apache Singa platform. The strategies consist training in synchronous mode, in asynchronous mode and in GPU. Although many factors determine the quality of the network design, in this project, training time is a major concern.
Training time of a deep network can be reduced to a very large extends by using GPU training, and achieve a small speedup by multi-process training on a single machine. A complete analysis may be done by also considering scalability factor and measure the performance in a cluster. |
author2 |
He Bingsheng |
author_facet |
He Bingsheng Chen, Peng |
format |
Final Year Project |
author |
Chen, Peng |
author_sort |
Chen, Peng |
title |
A study of deep learning on many-core processors |
title_short |
A study of deep learning on many-core processors |
title_full |
A study of deep learning on many-core processors |
title_fullStr |
A study of deep learning on many-core processors |
title_full_unstemmed |
A study of deep learning on many-core processors |
title_sort |
study of deep learning on many-core processors |
publishDate |
2016 |
url |
http://hdl.handle.net/10356/66954 |
_version_ |
1759857883191705600 |