Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its exibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractiv...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ayat, Sayed Omid, Hani, M. Khalil, Ab. Rahman, Ab. Al-Hadi
Format:	Article
Published:	Turkiye Klinikleri Journal of Medical Sciences 2018
Subjects:	TK Electrical engineering. Electronics Nuclear engineering
Online Access:	http://eprints.utm.my/id/eprint/85533/ http://dx.doi.org/10.3906/ELK-1706-222
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Universiti Teknologi Malaysia

id	my.utm.85533
record_format	eprints
spelling	my.utm.855332020-06-30T08:50:18Z http://eprints.utm.my/id/eprint/85533/ Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model Ayat, Sayed Omid Hani, M. Khalil Ab. Rahman, Ab. Al-Hadi TK Electrical engineering. Electronics Nuclear engineering In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its exibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractive solution to exploit the inherent parallelism in the feedforward process of the CNN. However, to meet the demands for high accuracy of today's practical recognition applications that typically have massive datasets, the sizes of CNNs have to be larger and deeper. Enlargement of the CNN aggravates the problem of off-chip memory bottleneck in the FPGA platform since there is not enough space to save large datasets on-chip. In this work, we propose a memory system architecture that best matches the off-chip memory traffic with the optimum throughput of the computation engine, while it operates at the maximum allowable frequency. With the help of an extended version of the Rooine model proposed in this work, we can estimate memory bandwidth utilization of the system at different operating frequencies since the proposed model considers operating frequency in addition to bandwidth utilization and throughput. In order to find the optimal solution that has the best energy efficiency, we make a trade-off between energy efficiency and computational throughput. This solution saves 18% of energy utilization with the trade-off having less than 2% reduction in throughput performance. We also propose to use a race-to-halt strategy to further improve the energy efficiency of the designed CNN accelerator. Experimental results show that our CNN accelerator can achieve a peak performance of 52.11 GFLOPS and energy efficiency of 10.02 GFLOPS/W on a ZYNQ ZC706 FPGA board running at 250 MHz, which outperforms most previous approaches. Turkiye Klinikleri Journal of Medical Sciences 2018 Article PeerReviewed Ayat, Sayed Omid and Hani, M. Khalil and Ab. Rahman, Ab. Al-Hadi (2018) Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model. Turkish Journal of Electrical Engineering and Computer Sciences, 26 (2). pp. 919-935. ISSN 1300-0632 http://dx.doi.org/10.3906/ELK-1706-222
institution	Universiti Teknologi Malaysia
building	UTM Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Teknologi Malaysia
content_source	UTM Institutional Repository
url_provider	http://eprints.utm.my/
topic	TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	TK Electrical engineering. Electronics Nuclear engineering Ayat, Sayed Omid Hani, M. Khalil Ab. Rahman, Ab. Al-Hadi Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
description	In recent years, the convolutional neural network (CNN) has found wide acceptance in solving practical computer vision and image recognition problems. Also recently, due to its exibility, faster development time, and energy efficiency, the field-programmable gate array (FPGA) has become an attractive solution to exploit the inherent parallelism in the feedforward process of the CNN. However, to meet the demands for high accuracy of today's practical recognition applications that typically have massive datasets, the sizes of CNNs have to be larger and deeper. Enlargement of the CNN aggravates the problem of off-chip memory bottleneck in the FPGA platform since there is not enough space to save large datasets on-chip. In this work, we propose a memory system architecture that best matches the off-chip memory traffic with the optimum throughput of the computation engine, while it operates at the maximum allowable frequency. With the help of an extended version of the Rooine model proposed in this work, we can estimate memory bandwidth utilization of the system at different operating frequencies since the proposed model considers operating frequency in addition to bandwidth utilization and throughput. In order to find the optimal solution that has the best energy efficiency, we make a trade-off between energy efficiency and computational throughput. This solution saves 18% of energy utilization with the trade-off having less than 2% reduction in throughput performance. We also propose to use a race-to-halt strategy to further improve the energy efficiency of the designed CNN accelerator. Experimental results show that our CNN accelerator can achieve a peak performance of 52.11 GFLOPS and energy efficiency of 10.02 GFLOPS/W on a ZYNQ ZC706 FPGA board running at 250 MHz, which outperforms most previous approaches.
format	Article
author	Ayat, Sayed Omid Hani, M. Khalil Ab. Rahman, Ab. Al-Hadi
author_facet	Ayat, Sayed Omid Hani, M. Khalil Ab. Rahman, Ab. Al-Hadi
author_sort	Ayat, Sayed Omid
title	Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
title_short	Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
title_full	Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
title_fullStr	Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
title_full_unstemmed	Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model
title_sort	optimizing fpga-based cnn accelerator for energy efficiency with an extended roofline model
publisher	Turkiye Klinikleri Journal of Medical Sciences
publishDate	2018
url	http://eprints.utm.my/id/eprint/85533/ http://dx.doi.org/10.3906/ELK-1706-222
_version_	1672610547784220672

Optimizing FPGA-based CNN accelerator for energy efficiency with an extended Roofline model

Similar Items