Running CNN efficiently on a FPGA
With increased demand for AI at the edge, there is a pressing need to adapt ever more computationally demanding deep learning models for deployment onto embedded devices. As accelerators for these networks, FPGAs have become preferred for their energy efficiency and adaptability, but models also...
Saved in:
主要作者: | |
---|---|
其他作者: | |
格式: | Final Year Project |
語言: | English |
出版: |
Nanyang Technological University
2022
|
主題: | |
在線閱讀: | https://hdl.handle.net/10356/156579 |
標簽: |
添加標簽
沒有標簽, 成為第一個標記此記錄!
|
機構: | Nanyang Technological University |
語言: | English |
總結: | With increased demand for AI at the edge, there is a pressing need to adapt
ever more computationally demanding deep learning models for deployment
onto embedded devices. As accelerators for these networks, FPGAs have
become preferred for their energy efficiency and adaptability, but models also
need to be pre-processed before effective FPGA-based hardware accelerators
can be designed. In this project, the author investigates the performance of
Block-Balanced Sparsity, a model compression approach that prunes parameter
matrices in deep learning networks via a structured manner that allows for
efficient FPGA accelerator implementations. By testing this approach across
different pruning strategies, the author found that the fine-tuning strategy led
to the highest model accuracy, gradual pruning allowed for the fastest model
development and learning rate rewinding provided the greatest ease-of-use. |
---|