Energy efficient circuits and architectural design for machine learning on edge

The number of Internet of Things (IoT) devices around the world is forecasted to be 50 billion by the year 2025. IoT devices are commonly referred to as edge devices, as they are able to connect to the Internet and operate at the edge of the network. IoT devices are also equipped with sensors to col...

Full description

Saved in:

Bibliographic Details
Main Author:	Chong, Yi Sheng
Other Authors:	Goh Wang Ling
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Electrical and electronic engineering::Integrated circuits Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering::Microelectronics
Online Access:	https://hdl.handle.net/10356/168616
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-168616
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Integrated circuits Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering::Microelectronics
spellingShingle	Engineering::Electrical and electronic engineering::Integrated circuits Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering::Microelectronics Chong, Yi Sheng Energy efficient circuits and architectural design for machine learning on edge
description	The number of Internet of Things (IoT) devices around the world is forecasted to be 50 billion by the year 2025. IoT devices are commonly referred to as edge devices, as they are able to connect to the Internet and operate at the edge of the network. IoT devices are also equipped with sensors to collect data and user input from the working environment. To intelligently process the data, neural network algorithm, a well-known machine learning technique, is employed for high accuracy performance. Since the neural network algorithm is compute and memory intensive, IoT devices face high latency due to their limited computation power on board. Thus, this thesis explores custom circuits and architecture design to accelerate neural network computation on edge devices. The first focus of this thesis is to enable convolutional neural network (CNN) computation on edge devices for image processing. Recent CNN models exploit a new convolution layer, i.e., depthwise separable convolution, to reduce model size and computation. In the new layer, pointwise convolution becomes the major CNN workload, which is not well supported by the existing CNN accelerators. Thus, a convolution (CONV) unit is proposed to handle the CNN computation but with dedicated support to pointwise convolution. To enable high energy efficiency, the proposed CONV unit employs weight stationary dataflow with input data reuse and computation parallelism. Implemented using a 40-nm technology node, the CONV unit attains an energy efficiency of 3.13 TOPS/W, which is the third best among the state-of-the-art for recent CNNs such as MobileNet, when working at nominal voltage of 0.85 V and frequency of 100 MHz. Besides, this thesis explores speech processing on IoT devices. In particular, keyword spotting (KWS) is to detect keywords in the sound recorded by a microphone before activating the power-consuming speech recognition system. KWS needs to be always-on to constantly detect keywords from the user voice input. Thus, a low power neural network based KWS hardware has been proposed not only to maximize the battery life of IoT devices, but also to achieve high KWS accuracy. The proposed KWS engine is composed of a Mel frequency cepstral coefficients (MFCC) module and a long short term memory (LSTM) accelerator. The MFCC module is optimized for low power by using hardware algorithm co-optimization. While, the LSTM accelerator is designed to run a compact yet accurate KWS LSTM model. The LSTM model is optimized for small model size using the novel enhanced top-k row pruning, compression as well as quantization, which in turn reduce the on-chip memory and area of the LSTM accelerator. The proposed KWS engine is implemented using a 40-nm technology node. It reports a power consumption of 2.5 uW, which is 2.2 times smaller as compared to the state-of-the-art LSTM-based KWS, when operating at voltage of 0.6 V and frequency of 400 kHz. Furthermore, this thesis explores the emerging compute-in-memory (CIM), which is used to overcome the memory bottleneck of the traditional Von-Neumann architecture by bringing the computation near to the memory, thus increasing the energy efficiency. CIM is an attractive candidate for accelerating the neural network computation, because it is naturally good at performing the matrix vector multiplications, which are the fundamental operation of neural networks. However, when mapping a neural network to a CIM hardware, computation errors exist, leading to accuracy drop. The errors are due to the non-idealities and stochastic programming response of the CIM memory cell. Therefore, this thesis proposes a chip-in-the-loop training scheme, which helps the network to adapt to the non-idealities and regain accuracy. The proposed scheme considers only two-state resistive random access memory (RRAM) and binarized neural network (BNN). The BNN attains high accuracy despite that the network weight is only 1-bit, while the weights can be easily mapped to the RRAM-based CIM for computation. The proposed training scheme successfully adjusts the weights of a four-layer fully-connected layer to regain the accuracy. In conclusion, the thesis investigates the energy efficient and low power neural network hardware that work in the resource-constrained edge environment. The two proposed accelerators, the CONV unit and KWS engine, have high potential to be integrated into edge devices as co-processors. Both can cater the edge devices' need for long battery life and real-time response, given their high energy efficiency and low power consumption. On the other hand, this thesis tackles the accuracy drop issue due to the non-idealities in CIM using the proposed network training scheme. Overcoming this challenge not only helps to harvest the high energy efficiency brought by the CIM, but also allows the CIM to deliver accurate response when the CIM is deployed for neural network based applications.
author2	Goh Wang Ling
author_facet	Goh Wang Ling Chong, Yi Sheng
format	Thesis-Doctor of Philosophy
author	Chong, Yi Sheng
author_sort	Chong, Yi Sheng
title	Energy efficient circuits and architectural design for machine learning on edge
title_short	Energy efficient circuits and architectural design for machine learning on edge
title_full	Energy efficient circuits and architectural design for machine learning on edge
title_fullStr	Energy efficient circuits and architectural design for machine learning on edge
title_full_unstemmed	Energy efficient circuits and architectural design for machine learning on edge
title_sort	energy efficient circuits and architectural design for machine learning on edge
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/168616
_version_	1772826260930560000
spelling	sg-ntu-dr.10356-1686162023-07-04T01:52:12Z Energy efficient circuits and architectural design for machine learning on edge Chong, Yi Sheng Goh Wang Ling Ong Yew Soon Interdisciplinary Graduate School (IGS) Energy Research Institute @ NTU (ERI@N) EWLGOH@ntu.edu.sg, ASYSOng@ntu.edu.sg Engineering::Electrical and electronic engineering::Integrated circuits Engineering::Computer science and engineering::Hardware::Register-transfer-level implementation Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering::Microelectronics The number of Internet of Things (IoT) devices around the world is forecasted to be 50 billion by the year 2025. IoT devices are commonly referred to as edge devices, as they are able to connect to the Internet and operate at the edge of the network. IoT devices are also equipped with sensors to collect data and user input from the working environment. To intelligently process the data, neural network algorithm, a well-known machine learning technique, is employed for high accuracy performance. Since the neural network algorithm is compute and memory intensive, IoT devices face high latency due to their limited computation power on board. Thus, this thesis explores custom circuits and architecture design to accelerate neural network computation on edge devices. The first focus of this thesis is to enable convolutional neural network (CNN) computation on edge devices for image processing. Recent CNN models exploit a new convolution layer, i.e., depthwise separable convolution, to reduce model size and computation. In the new layer, pointwise convolution becomes the major CNN workload, which is not well supported by the existing CNN accelerators. Thus, a convolution (CONV) unit is proposed to handle the CNN computation but with dedicated support to pointwise convolution. To enable high energy efficiency, the proposed CONV unit employs weight stationary dataflow with input data reuse and computation parallelism. Implemented using a 40-nm technology node, the CONV unit attains an energy efficiency of 3.13 TOPS/W, which is the third best among the state-of-the-art for recent CNNs such as MobileNet, when working at nominal voltage of 0.85 V and frequency of 100 MHz. Besides, this thesis explores speech processing on IoT devices. In particular, keyword spotting (KWS) is to detect keywords in the sound recorded by a microphone before activating the power-consuming speech recognition system. KWS needs to be always-on to constantly detect keywords from the user voice input. Thus, a low power neural network based KWS hardware has been proposed not only to maximize the battery life of IoT devices, but also to achieve high KWS accuracy. The proposed KWS engine is composed of a Mel frequency cepstral coefficients (MFCC) module and a long short term memory (LSTM) accelerator. The MFCC module is optimized for low power by using hardware algorithm co-optimization. While, the LSTM accelerator is designed to run a compact yet accurate KWS LSTM model. The LSTM model is optimized for small model size using the novel enhanced top-k row pruning, compression as well as quantization, which in turn reduce the on-chip memory and area of the LSTM accelerator. The proposed KWS engine is implemented using a 40-nm technology node. It reports a power consumption of 2.5 uW, which is 2.2 times smaller as compared to the state-of-the-art LSTM-based KWS, when operating at voltage of 0.6 V and frequency of 400 kHz. Furthermore, this thesis explores the emerging compute-in-memory (CIM), which is used to overcome the memory bottleneck of the traditional Von-Neumann architecture by bringing the computation near to the memory, thus increasing the energy efficiency. CIM is an attractive candidate for accelerating the neural network computation, because it is naturally good at performing the matrix vector multiplications, which are the fundamental operation of neural networks. However, when mapping a neural network to a CIM hardware, computation errors exist, leading to accuracy drop. The errors are due to the non-idealities and stochastic programming response of the CIM memory cell. Therefore, this thesis proposes a chip-in-the-loop training scheme, which helps the network to adapt to the non-idealities and regain accuracy. The proposed scheme considers only two-state resistive random access memory (RRAM) and binarized neural network (BNN). The BNN attains high accuracy despite that the network weight is only 1-bit, while the weights can be easily mapped to the RRAM-based CIM for computation. The proposed training scheme successfully adjusts the weights of a four-layer fully-connected layer to regain the accuracy. In conclusion, the thesis investigates the energy efficient and low power neural network hardware that work in the resource-constrained edge environment. The two proposed accelerators, the CONV unit and KWS engine, have high potential to be integrated into edge devices as co-processors. Both can cater the edge devices' need for long battery life and real-time response, given their high energy efficiency and low power consumption. On the other hand, this thesis tackles the accuracy drop issue due to the non-idealities in CIM using the proposed network training scheme. Overcoming this challenge not only helps to harvest the high energy efficiency brought by the CIM, but also allows the CIM to deliver accurate response when the CIM is deployed for neural network based applications. Doctor of Philosophy 2023-06-13T08:14:16Z 2023-06-13T08:14:16Z 2022 Thesis-Doctor of Philosophy Chong, Y. S. (2022). Energy efficient circuits and architectural design for machine learning on edge. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/168616 https://hdl.handle.net/10356/168616 10.32657/10356/168616 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Energy efficient circuits and architectural design for machine learning on edge

Similar Items