FPGA implementation of low-power real-time convolutional neural network inference

While artificial intelligence is applied in many areas of live, its computational intensity requires the presence of a large amount of computing resources. The data which are meant to be processed with those algorithms, however, are not generated in data centres or on desktop workstations. Instead,...

Full description

Saved in:
Bibliographic Details
Main Author: Gerlinghoff, Daniel
Other Authors: Zheng Yuanjin
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/137750
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-137750
record_format dspace
spelling sg-ntu-dr.10356-1377502023-07-04T16:49:41Z FPGA implementation of low-power real-time convolutional neural network inference Gerlinghoff, Daniel Zheng Yuanjin School of Electrical and Electronic Engineering Silicon Laboratories YJZHENG@ntu.edu.sg Engineering::Electrical and electronic engineering::Integrated circuits While artificial intelligence is applied in many areas of live, its computational intensity requires the presence of a large amount of computing resources. The data which are meant to be processed with those algorithms, however, are not generated in data centres or on desktop workstations. Instead, they originate from mobile devices and sensor networks which are highly constrained in terms of hardware resources and power. To close this gap, this work presents an implementation of a convolutional neural network which aims to be deployed on low-power low-cost FPGA devices. Those devices are potentially used in IoT applications which involve the acquisition of a large amount of data. However, logic and memory resources of those FPGAs are sparse. Therefore, this implementation optimizes the execution of the convolution operation for scalability. By adjusting only a few parameters in the design, the deployment is possible on both low-power and high-performance devices. That is made possible by separating the data storage and the data processing. The implementation further features a careful planning of data movement in the device to minimize power consumption and logic utilization. Three different types of memory are employed for the caching of data. Data values are stored with an 8-bit resolution which leads to a drop of classification accuracy by around 0.5 %. The design was tested on an Altera Cyclone V device and achieved a performance of around 420 million operations per second at a clock frequency of 100 MHz. In relation to the power, the design runs at around 0.35 GOPS/W. That is lower compared to previous implementations. In terms of absolute power consumption, however, it is superior, as the complete functionality can be enabled with only around 1 Watt. Master of Science (Integrated Circuit Design) 2020-04-13T06:52:41Z 2020-04-13T06:52:41Z 2020 Thesis-Master by Coursework https://hdl.handle.net/10356/137750 en application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering::Integrated circuits
spellingShingle Engineering::Electrical and electronic engineering::Integrated circuits
Gerlinghoff, Daniel
FPGA implementation of low-power real-time convolutional neural network inference
description While artificial intelligence is applied in many areas of live, its computational intensity requires the presence of a large amount of computing resources. The data which are meant to be processed with those algorithms, however, are not generated in data centres or on desktop workstations. Instead, they originate from mobile devices and sensor networks which are highly constrained in terms of hardware resources and power. To close this gap, this work presents an implementation of a convolutional neural network which aims to be deployed on low-power low-cost FPGA devices. Those devices are potentially used in IoT applications which involve the acquisition of a large amount of data. However, logic and memory resources of those FPGAs are sparse. Therefore, this implementation optimizes the execution of the convolution operation for scalability. By adjusting only a few parameters in the design, the deployment is possible on both low-power and high-performance devices. That is made possible by separating the data storage and the data processing. The implementation further features a careful planning of data movement in the device to minimize power consumption and logic utilization. Three different types of memory are employed for the caching of data. Data values are stored with an 8-bit resolution which leads to a drop of classification accuracy by around 0.5 %. The design was tested on an Altera Cyclone V device and achieved a performance of around 420 million operations per second at a clock frequency of 100 MHz. In relation to the power, the design runs at around 0.35 GOPS/W. That is lower compared to previous implementations. In terms of absolute power consumption, however, it is superior, as the complete functionality can be enabled with only around 1 Watt.
author2 Zheng Yuanjin
author_facet Zheng Yuanjin
Gerlinghoff, Daniel
format Thesis-Master by Coursework
author Gerlinghoff, Daniel
author_sort Gerlinghoff, Daniel
title FPGA implementation of low-power real-time convolutional neural network inference
title_short FPGA implementation of low-power real-time convolutional neural network inference
title_full FPGA implementation of low-power real-time convolutional neural network inference
title_fullStr FPGA implementation of low-power real-time convolutional neural network inference
title_full_unstemmed FPGA implementation of low-power real-time convolutional neural network inference
title_sort fpga implementation of low-power real-time convolutional neural network inference
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/137750
_version_ 1772828251333328896