Edge processing in IoT using approximate and in-memory computing

With a large number of sensors getting connected to the internet, scalability of Internet of Things (IoT) has started to hinge on Edge computing-the ability to partly process the raw data at the sensor on the edge of the network instead of transmitting all data to the cloud. However, sensor nodes...

Full description

Saved in:
Bibliographic Details
Main Author: Bose, Sumon Kumar
Other Authors: Arindam Basu
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/152016
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-152016
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering::Integrated circuits
Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
spellingShingle Engineering::Electrical and electronic engineering::Integrated circuits
Engineering::Electrical and electronic engineering::Electronic systems::Signal processing
Bose, Sumon Kumar
Edge processing in IoT using approximate and in-memory computing
description With a large number of sensors getting connected to the internet, scalability of Internet of Things (IoT) has started to hinge on Edge computing-the ability to partly process the raw data at the sensor on the edge of the network instead of transmitting all data to the cloud. However, sensor nodes are typically highly power-constrained due to the limited battery and also requires a long lifetime due to difficulties in replacing nodes in many applications. Hence, this thesis focuses on using different circuit and algorithmic techniques in particular approximate computing, near and in-memory computing (IMC), dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption of edge devices in the Internet of Things. As a first example, we choose predictive maintenance (PdM), one of the most important applications pertaining to IoT in Industry 4.0. Machine learning is used to predict the failure of a machine before the actual event occurs. However, the main challenges in PdM are (a) lack of enough data from failing machines to train binary classifi ers, and (b) paucity of power and bandwidth to transmit sensor data to cloud throughout the lifetime of the machine. In our work, we propose an anomaly detection scheme that can be trained only using healthy machine data. Our Anomaly Detection based Power Saving (ADEPOS) scheme is aimed at saving energy by using approximate computing through the lifetime of the machine. At the beginning of the machine's life, low accuracy computations are used when the probability of the machine being healthy is high. However, on the detection of anomalies, as time progresses, the anomaly detector is switched to higher accuracy modes. Reduction in computation accuracy may be achieved in many ways, such as reducing the number of neurons, reducing the bit width of data, dynamic voltage frequency scaling, etc. Tested on the NASA bearing dataset, ADEPOS demonstrates up to 8.8x reduction of neurons on average over the lifetime of bearings. This resulted in 8.95x energy saving for microprocessor implementation and ~18.8x energy saving in an ASIC implementation, both in 65nm CMOS. The second part of this research explores the near and in-memory computing (IMC) to reduce the data movement between the storage and processing elements for video processing in the application of traffic surveillance. Generally, image frames from a camera undergo image denoising, region proposal, object classi cation, and object tracking steps for traffic surveillance and monitoring. However, a realization of this data-intensive computing following traditional von Neumann architecture involves a higher energy dissipation and more substantial execution time due to the enormous data movement between computing and storage units. Further, for stationary cameras, there exists signi cant temporal redundancy which can be exploited by event-driven or neuromorphic vision sensors (NVS) that report data only when there is activity in the scene. However, due to the presence of noise, NVS pixels report events even in the absence of actual activity. In this dissertation, a 6T-SRAM in-memory computing based image denoising for event-based binary image (EBBI) frame from a neuromorphic vision sensor (NVS) is presented. We suggest a nonoverlap median lter (NOMF), an approximation of a traditional median lter for image denoising. The NOMF enables us to implement image denoising leveraging the inherent read disturb phenomenon of the 6T-SRAM. Besides, detecting zero frames is easily done by IMC techniques tracking bit line voltage during ltering operation and this can be used now to shut off the rest of the processor for ~2x energy bene ts in urban traffic settings. Fabricated in 65nm CMOS, this chip produces denoised frames with an energy efficiency of 51.3 TOPS/W and a peak throughput of 134.4 GOPS at 70MHz. As a next step, we propose a 9T-SRAM near and in-memory computing based region proposal network for the event-based binary image frame to exploit spatial redundancy in the valid frames. The region proposal network nds out the bounding box encapsulating of an object which reduces the computation of an object recognition deep neural network (DNN) by con ning the computing region surrounding the object instead of the whole image frame. The proposed 9T-SRAM cell enables a 1-D projection of objects on the horizontal and vertical axes of an image. An iterative and selective search of the rising and falling edges of 1-D projection yields the coordinates of a bounding box encapsulating an object. Simulated in 65nm CMOS, this chip produces up to 16 region proposals per frame and achieves ~682x energy savings compared to the digitally implemented connected component labeling (CCL) algorithm and throughput of 1.17 frames/usec at 200MHz. In summary, we presented a set of algorithms and hardware solutions for energy efficient edge computing that use approximate and in-memory compute techniques. We have demonstrated the results in two different applications of predictive maintenance and traffic monitoring.
author2 Arindam Basu
author_facet Arindam Basu
Bose, Sumon Kumar
format Thesis-Doctor of Philosophy
author Bose, Sumon Kumar
author_sort Bose, Sumon Kumar
title Edge processing in IoT using approximate and in-memory computing
title_short Edge processing in IoT using approximate and in-memory computing
title_full Edge processing in IoT using approximate and in-memory computing
title_fullStr Edge processing in IoT using approximate and in-memory computing
title_full_unstemmed Edge processing in IoT using approximate and in-memory computing
title_sort edge processing in iot using approximate and in-memory computing
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/152016
_version_ 1772826897200185344
spelling sg-ntu-dr.10356-1520162023-07-04T17:40:32Z Edge processing in IoT using approximate and in-memory computing Bose, Sumon Kumar Arindam Basu School of Electrical and Electronic Engineering Delta-NTU Corporate Laboratory arindam.basu@ntu.edu.sg Engineering::Electrical and electronic engineering::Integrated circuits Engineering::Electrical and electronic engineering::Electronic systems::Signal processing With a large number of sensors getting connected to the internet, scalability of Internet of Things (IoT) has started to hinge on Edge computing-the ability to partly process the raw data at the sensor on the edge of the network instead of transmitting all data to the cloud. However, sensor nodes are typically highly power-constrained due to the limited battery and also requires a long lifetime due to difficulties in replacing nodes in many applications. Hence, this thesis focuses on using different circuit and algorithmic techniques in particular approximate computing, near and in-memory computing (IMC), dynamic voltage and frequency scaling (DVFS) to reduce the energy consumption of edge devices in the Internet of Things. As a first example, we choose predictive maintenance (PdM), one of the most important applications pertaining to IoT in Industry 4.0. Machine learning is used to predict the failure of a machine before the actual event occurs. However, the main challenges in PdM are (a) lack of enough data from failing machines to train binary classifi ers, and (b) paucity of power and bandwidth to transmit sensor data to cloud throughout the lifetime of the machine. In our work, we propose an anomaly detection scheme that can be trained only using healthy machine data. Our Anomaly Detection based Power Saving (ADEPOS) scheme is aimed at saving energy by using approximate computing through the lifetime of the machine. At the beginning of the machine's life, low accuracy computations are used when the probability of the machine being healthy is high. However, on the detection of anomalies, as time progresses, the anomaly detector is switched to higher accuracy modes. Reduction in computation accuracy may be achieved in many ways, such as reducing the number of neurons, reducing the bit width of data, dynamic voltage frequency scaling, etc. Tested on the NASA bearing dataset, ADEPOS demonstrates up to 8.8x reduction of neurons on average over the lifetime of bearings. This resulted in 8.95x energy saving for microprocessor implementation and ~18.8x energy saving in an ASIC implementation, both in 65nm CMOS. The second part of this research explores the near and in-memory computing (IMC) to reduce the data movement between the storage and processing elements for video processing in the application of traffic surveillance. Generally, image frames from a camera undergo image denoising, region proposal, object classi cation, and object tracking steps for traffic surveillance and monitoring. However, a realization of this data-intensive computing following traditional von Neumann architecture involves a higher energy dissipation and more substantial execution time due to the enormous data movement between computing and storage units. Further, for stationary cameras, there exists signi cant temporal redundancy which can be exploited by event-driven or neuromorphic vision sensors (NVS) that report data only when there is activity in the scene. However, due to the presence of noise, NVS pixels report events even in the absence of actual activity. In this dissertation, a 6T-SRAM in-memory computing based image denoising for event-based binary image (EBBI) frame from a neuromorphic vision sensor (NVS) is presented. We suggest a nonoverlap median lter (NOMF), an approximation of a traditional median lter for image denoising. The NOMF enables us to implement image denoising leveraging the inherent read disturb phenomenon of the 6T-SRAM. Besides, detecting zero frames is easily done by IMC techniques tracking bit line voltage during ltering operation and this can be used now to shut off the rest of the processor for ~2x energy bene ts in urban traffic settings. Fabricated in 65nm CMOS, this chip produces denoised frames with an energy efficiency of 51.3 TOPS/W and a peak throughput of 134.4 GOPS at 70MHz. As a next step, we propose a 9T-SRAM near and in-memory computing based region proposal network for the event-based binary image frame to exploit spatial redundancy in the valid frames. The region proposal network nds out the bounding box encapsulating of an object which reduces the computation of an object recognition deep neural network (DNN) by con ning the computing region surrounding the object instead of the whole image frame. The proposed 9T-SRAM cell enables a 1-D projection of objects on the horizontal and vertical axes of an image. An iterative and selective search of the rising and falling edges of 1-D projection yields the coordinates of a bounding box encapsulating an object. Simulated in 65nm CMOS, this chip produces up to 16 region proposals per frame and achieves ~682x energy savings compared to the digitally implemented connected component labeling (CCL) algorithm and throughput of 1.17 frames/usec at 200MHz. In summary, we presented a set of algorithms and hardware solutions for energy efficient edge computing that use approximate and in-memory compute techniques. We have demonstrated the results in two different applications of predictive maintenance and traffic monitoring. Doctor of Philosophy 2021-07-15T03:10:22Z 2021-07-15T03:10:22Z 2021 Thesis-Doctor of Philosophy Bose, S. K. (2021). Edge processing in IoT using approximate and in-memory computing. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/152016 https://hdl.handle.net/10356/152016 10.32657/10356/152016 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University