Running CNN efficiently on a FPGA
With increased demand for AI at the edge, there is a pressing need to adapt ever more computationally demanding deep learning models for deployment onto embedded devices. As accelerators for these networks, FPGAs have become preferred for their energy efficiency and adaptability, but models also...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/156579 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-156579 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1565792022-04-20T08:14:19Z Running CNN efficiently on a FPGA Yang, Shenghao Weichen Liu School of Computer Science and Engineering liu@ntu.edu.sg Engineering::Computer science and engineering With increased demand for AI at the edge, there is a pressing need to adapt ever more computationally demanding deep learning models for deployment onto embedded devices. As accelerators for these networks, FPGAs have become preferred for their energy efficiency and adaptability, but models also need to be pre-processed before effective FPGA-based hardware accelerators can be designed. In this project, the author investigates the performance of Block-Balanced Sparsity, a model compression approach that prunes parameter matrices in deep learning networks via a structured manner that allows for efficient FPGA accelerator implementations. By testing this approach across different pruning strategies, the author found that the fine-tuning strategy led to the highest model accuracy, gradual pruning allowed for the fastest model development and learning rate rewinding provided the greatest ease-of-use. Bachelor of Engineering (Computer Engineering) 2022-04-20T08:14:19Z 2022-04-20T08:14:19Z 2022 Final Year Project (FYP) Yang, S. (2022). Running CNN efficiently on a FPGA. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/156579 https://hdl.handle.net/10356/156579 en application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering |
spellingShingle |
Engineering::Computer science and engineering Yang, Shenghao Running CNN efficiently on a FPGA |
description |
With increased demand for AI at the edge, there is a pressing need to adapt
ever more computationally demanding deep learning models for deployment
onto embedded devices. As accelerators for these networks, FPGAs have
become preferred for their energy efficiency and adaptability, but models also
need to be pre-processed before effective FPGA-based hardware accelerators
can be designed. In this project, the author investigates the performance of
Block-Balanced Sparsity, a model compression approach that prunes parameter
matrices in deep learning networks via a structured manner that allows for
efficient FPGA accelerator implementations. By testing this approach across
different pruning strategies, the author found that the fine-tuning strategy led
to the highest model accuracy, gradual pruning allowed for the fastest model
development and learning rate rewinding provided the greatest ease-of-use. |
author2 |
Weichen Liu |
author_facet |
Weichen Liu Yang, Shenghao |
format |
Final Year Project |
author |
Yang, Shenghao |
author_sort |
Yang, Shenghao |
title |
Running CNN efficiently on a FPGA |
title_short |
Running CNN efficiently on a FPGA |
title_full |
Running CNN efficiently on a FPGA |
title_fullStr |
Running CNN efficiently on a FPGA |
title_full_unstemmed |
Running CNN efficiently on a FPGA |
title_sort |
running cnn efficiently on a fpga |
publisher |
Nanyang Technological University |
publishDate |
2022 |
url |
https://hdl.handle.net/10356/156579 |
_version_ |
1731235769785253888 |