Design of energy-efficient convolution neural network accelerator

The rapid growth of Internet of Things (IoT) devices has created a need for efficient, low power computing solutions that can handle tasks like image and speech recognition. Convolutional Neural Networks (CNNs) are key for these intelligent tasks because they perform well in many machine learnin...

Full description

Saved in:
Bibliographic Details
Main Author: Shao, Yuhan
Other Authors: Kim Tae Hyoung
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2025
Subjects:
Online Access:https://hdl.handle.net/10356/182747
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:The rapid growth of Internet of Things (IoT) devices has created a need for efficient, low power computing solutions that can handle tasks like image and speech recognition. Convolutional Neural Networks (CNNs) are key for these intelligent tasks because they perform well in many machine learning applications. However, CNNs require a lot of computing power and energy, making them difficult to use on IoT devices with limited resources. This dissertation addresses these issues by proposing a new design for a low power CNN hardware accelerator specifically for IoT applications, using FPGA hardware acceleration. The motivation for this work comes from the need to run advanced machine learning algorithms directly on edge devices, where power efficiency and speed are crucial. Traditional methods that rely on cloud-based processing face delays, higher power use due to data transfer, and privacy concerns. Therefore, there is a strong need for on-device, real-time processing that maintains high performance and energy efficiency. The main goal of this dissertation is to design and build a hardware accelerator that significantly reduces the power consumption of CNNs without lowering their accuracy or speed. This involves improving both the CNN architecture and the hardware. Techniques like weight quantization, pruning, and using specialized low-power circuits are explored to achieve these goals. Additionally, the design takes advantage of FPGA’s flexibility and parallel processing capabilities to create a compact and efficient accelerator. A thorough review of existing CNN accelerators and their limitations sets the foundation for the proposed design. This dissertation introduces several new ideas, including energy efficient memory systems, parallel processing units, and custom dataflow architectures. By combining these features with FPGA hardware acceleration, the proposed accelerator improves both power efficiency and computational performance. Exploiting FPGA flexibility, the design incorporates dynamic voltage-frequency scaling (DVFS) to lower power consumption to 1.2 W at 200 MHz. This approach achieves energy efficiency improvements of 3.5× over GPU-based solutions and 1.8× over ASIC alternatives. Additionally, time-multiplexed DSP blocks reduce LUT usage by 38% without impacting throughput. In summary, this dissertation offers a complete solution to the challenges of deploying CNNs on IoT devices. By focusing on power efficiency and performance with FPGA based hardware acceleration, the proposed CNN accelerator provides a feasible approach for integrating advanced machine learning capabilities into the next generation of IoT devices. The innovations and findings in this work contribute to the field of low-power hardware design and provide a feasible approach for future research in energy-efficient computing.