Low-power circuits for neuromorphic vision sensor based internet of video things

There has been a tremendous growth in the number of sensors under the paradigm of the Internet of Things (IoT) spurred by the advent of 5G communication. Among such sensors, video cameras hold a special role due to their rich information content. However, due to the huge volume of such data, it requ...

Full description

Saved in:

Bibliographic Details
Main Author:	Zhang, Xueyong
Other Authors:	Gwee Bah Hwee
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2021
Subjects:	Engineering::Electrical and electronic engineering::Integrated circuits
Online Access:	https://hdl.handle.net/10356/154407
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-154407
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Integrated circuits
spellingShingle	Engineering::Electrical and electronic engineering::Integrated circuits Zhang, Xueyong Low-power circuits for neuromorphic vision sensor based internet of video things
description	There has been a tremendous growth in the number of sensors under the paradigm of the Internet of Things (IoT) spurred by the advent of 5G communication. Among such sensors, video cameras hold a special role due to their rich information content. However, due to the huge volume of such data, it requires a special paradigm — internet of video things (IoVT) — to process and handle scalability of such networks. This method leverages the huge success of deep learning to use machine learning accelerators at the sensor node to perform “Edge” computing thus reducing the wireless transmission bottleneck. However, von Neumann architectures take unbearable energy and latency costs for deep learning accelerator hardware, because of the separation of data storage device and computing unit. The increasing demands on memory storage capacity and computational capability make it challenging to deal with huge data on resource limited platforms such as portable products and remote sensory devices. In computer vision (CV) applications for traffic surveillance and monitoring, image frames from a camera undergo several processing steps such as image denoising, region proposal, object classification, and object tracking. However, implementation of this data-intensive computing based on traditional von Neumann architecture involves a huge energy dissipation and more tedious execution time due to the enormous data movement between computing unit and storage block. Neuromorphic vision sensors (NVSs) hold promises for such applications due to its ability to reduce data at the source by optimal sampling. NVS only records active event data and ignore the stationary background reducing the redundant data significantly. Recent work has shown a hybrid frame-event approach that processes event based binary images (EBBI) created out of events from a NVS allows for efficient denoising and region proposal (RP) operations. However, no hardware implementation was reported. To overcome the bottleneck of the system performance and implement the integrated circuits for image or video processing in the application of traffic surveillance, near memory computing and in memory computing were proposed to reduce or even remove the data movement between memory device and processor unit. We first proposed a novel Collocated Random Access Memory (CRAM) based analog in memory computing (IMC) architecture to achieve parallel image denoising and image filling. The process of image denoising and filling is essential for NVS because of the inherent random noise (spurious events) due to thermal noise in transistors, shot noise and junction leakage current of the photodiode etc. The proposed approach is tested with the binary image frames from a Dynamic and Active-pixel Vision Sensor (DAVIS) setup and achieves around 10000X lesser energy cost compared to conventional non-IMC approach in the same process (in 65 nm CMOS). The fully parallel natural diffusion architecture reduces the processing time at least to 20 ns and average power consumption to 170 pJ per frame, leveraging large throughput and energy efficiency. The second part of this thesis explores low power region proposal (RP) algorithms and hardware implementations. We propose an edge event driven RP (EEDRP) approach with programmable parameters for the event-based binary image to exploit spatial redundancy in the valid frames. The proposed EEDRP network can quickly find out the bounding boxes of each object in the image to reduce the computation complexity of the successive deep neural network (DNN) by confining the computing region to the proposed bounding boxes instead of the whole image frame. The EEDRP algorithm can realize near-memory computing by reading the image memory and processing locally. By scanning the whole memory array once, the RP computation happens only when the edge event (rising edge or falling edge) is detected. The EEDRP enable us deal with images even with noises and holes and it is also tolerant of fragmented objects because it will merge objects if the distance is less than the configured parameter. All the parameters can be programmable for different application scenarios. The EEDRP algorithm performs a high accuracy, high energy efficient and low latency region proposal than the traditional connected component labeling (CCL) algorithm. Simulated in 65 nm CMOS, this chip produces up to 15 region proposals per frame and achieves ⁓580X energy savings compared to the digitally implemented CCL algorithm and throughput of 2.6 frames/msec at 200 MHz. We also proposed axes projection based RP (APBRP) to further reduce energy and time cost. It achieves ⁓1767X faster than the CCLRP implementation thanks to the parallel in memory computing technique. From measurement results, the in-memory computing based APBRP is ⁓2700X more energy efficient than the near-memory based EEDRP. The weighted F1 scores of both EEDRP and APBRP achieve 2.55X and 1.7X better than the conventional HISTRP and CCLRP, respectively. The image in these regions have to be next classified by neural networks and neuromorphic implementations that utilize analog or physical computing are promising and have been known to be energy efficient compared to digital baselines. Neuro-inspired spiking neural networks (SNN) have also gained popularity due to the promise of sparse activation leading to lower energy dissipation. In recent years, time-based computational circuits for DNN/SNN are gaining popularity due to the reduced power supply in scaled CMOS. An important building block in these designs is a digital delay cell. For example, it is used to create an oscillator that can convert analogue current to digital output (rate based neuron) or be used as an integrate and fire neuron with bio-plausible refractory period and spike frequency adaptation features. This thesis also explored an energy- and area-efficient full differential CMOS current controlled ring oscillator (CCO) as a suitable and compact structure in neural network applications. The neuronal oscillator achieves higher frequency while consuming lower area and lower energy due to less transistors are utilized compared with the conventional structure. By eliminating the unnecessary transistors, the proposed structure is composed of a simplest dynamic positive feedback latch and differential pairs, saving 25% area in size. The CCO can be tuned by both input voltage and external variable resistor. The measurement results show our work achieves 11% frequency improvement and 13% energy-efficient without degrading the jitter and phase noise characteristics. In summary, we presented a set of algorithms and hardware solutions for energy efficient neuromorphic circuits that use near/in-memory computing techniques and time base computing approach. We have demonstrated the testing results and performance in the application of traffic monitoring.
author2	Gwee Bah Hwee
author_facet	Gwee Bah Hwee Zhang, Xueyong
format	Thesis-Doctor of Philosophy
author	Zhang, Xueyong
author_sort	Zhang, Xueyong
title	Low-power circuits for neuromorphic vision sensor based internet of video things
title_short	Low-power circuits for neuromorphic vision sensor based internet of video things
title_full	Low-power circuits for neuromorphic vision sensor based internet of video things
title_fullStr	Low-power circuits for neuromorphic vision sensor based internet of video things
title_full_unstemmed	Low-power circuits for neuromorphic vision sensor based internet of video things
title_sort	low-power circuits for neuromorphic vision sensor based internet of video things
publisher	Nanyang Technological University
publishDate	2021
url	https://hdl.handle.net/10356/154407
_version_	1772826571441176576
spelling	sg-ntu-dr.10356-1544072023-07-04T17:41:45Z Low-power circuits for neuromorphic vision sensor based internet of video things Zhang, Xueyong Gwee Bah Hwee School of Electrical and Electronic Engineering ebhgwee@ntu.edu.sg Engineering::Electrical and electronic engineering::Integrated circuits There has been a tremendous growth in the number of sensors under the paradigm of the Internet of Things (IoT) spurred by the advent of 5G communication. Among such sensors, video cameras hold a special role due to their rich information content. However, due to the huge volume of such data, it requires a special paradigm — internet of video things (IoVT) — to process and handle scalability of such networks. This method leverages the huge success of deep learning to use machine learning accelerators at the sensor node to perform “Edge” computing thus reducing the wireless transmission bottleneck. However, von Neumann architectures take unbearable energy and latency costs for deep learning accelerator hardware, because of the separation of data storage device and computing unit. The increasing demands on memory storage capacity and computational capability make it challenging to deal with huge data on resource limited platforms such as portable products and remote sensory devices. In computer vision (CV) applications for traffic surveillance and monitoring, image frames from a camera undergo several processing steps such as image denoising, region proposal, object classification, and object tracking. However, implementation of this data-intensive computing based on traditional von Neumann architecture involves a huge energy dissipation and more tedious execution time due to the enormous data movement between computing unit and storage block. Neuromorphic vision sensors (NVSs) hold promises for such applications due to its ability to reduce data at the source by optimal sampling. NVS only records active event data and ignore the stationary background reducing the redundant data significantly. Recent work has shown a hybrid frame-event approach that processes event based binary images (EBBI) created out of events from a NVS allows for efficient denoising and region proposal (RP) operations. However, no hardware implementation was reported. To overcome the bottleneck of the system performance and implement the integrated circuits for image or video processing in the application of traffic surveillance, near memory computing and in memory computing were proposed to reduce or even remove the data movement between memory device and processor unit. We first proposed a novel Collocated Random Access Memory (CRAM) based analog in memory computing (IMC) architecture to achieve parallel image denoising and image filling. The process of image denoising and filling is essential for NVS because of the inherent random noise (spurious events) due to thermal noise in transistors, shot noise and junction leakage current of the photodiode etc. The proposed approach is tested with the binary image frames from a Dynamic and Active-pixel Vision Sensor (DAVIS) setup and achieves around 10000X lesser energy cost compared to conventional non-IMC approach in the same process (in 65 nm CMOS). The fully parallel natural diffusion architecture reduces the processing time at least to 20 ns and average power consumption to 170 pJ per frame, leveraging large throughput and energy efficiency. The second part of this thesis explores low power region proposal (RP) algorithms and hardware implementations. We propose an edge event driven RP (EEDRP) approach with programmable parameters for the event-based binary image to exploit spatial redundancy in the valid frames. The proposed EEDRP network can quickly find out the bounding boxes of each object in the image to reduce the computation complexity of the successive deep neural network (DNN) by confining the computing region to the proposed bounding boxes instead of the whole image frame. The EEDRP algorithm can realize near-memory computing by reading the image memory and processing locally. By scanning the whole memory array once, the RP computation happens only when the edge event (rising edge or falling edge) is detected. The EEDRP enable us deal with images even with noises and holes and it is also tolerant of fragmented objects because it will merge objects if the distance is less than the configured parameter. All the parameters can be programmable for different application scenarios. The EEDRP algorithm performs a high accuracy, high energy efficient and low latency region proposal than the traditional connected component labeling (CCL) algorithm. Simulated in 65 nm CMOS, this chip produces up to 15 region proposals per frame and achieves ⁓580X energy savings compared to the digitally implemented CCL algorithm and throughput of 2.6 frames/msec at 200 MHz. We also proposed axes projection based RP (APBRP) to further reduce energy and time cost. It achieves ⁓1767X faster than the CCLRP implementation thanks to the parallel in memory computing technique. From measurement results, the in-memory computing based APBRP is ⁓2700X more energy efficient than the near-memory based EEDRP. The weighted F1 scores of both EEDRP and APBRP achieve 2.55X and 1.7X better than the conventional HISTRP and CCLRP, respectively. The image in these regions have to be next classified by neural networks and neuromorphic implementations that utilize analog or physical computing are promising and have been known to be energy efficient compared to digital baselines. Neuro-inspired spiking neural networks (SNN) have also gained popularity due to the promise of sparse activation leading to lower energy dissipation. In recent years, time-based computational circuits for DNN/SNN are gaining popularity due to the reduced power supply in scaled CMOS. An important building block in these designs is a digital delay cell. For example, it is used to create an oscillator that can convert analogue current to digital output (rate based neuron) or be used as an integrate and fire neuron with bio-plausible refractory period and spike frequency adaptation features. This thesis also explored an energy- and area-efficient full differential CMOS current controlled ring oscillator (CCO) as a suitable and compact structure in neural network applications. The neuronal oscillator achieves higher frequency while consuming lower area and lower energy due to less transistors are utilized compared with the conventional structure. By eliminating the unnecessary transistors, the proposed structure is composed of a simplest dynamic positive feedback latch and differential pairs, saving 25% area in size. The CCO can be tuned by both input voltage and external variable resistor. The measurement results show our work achieves 11% frequency improvement and 13% energy-efficient without degrading the jitter and phase noise characteristics. In summary, we presented a set of algorithms and hardware solutions for energy efficient neuromorphic circuits that use near/in-memory computing techniques and time base computing approach. We have demonstrated the testing results and performance in the application of traffic monitoring. Doctor of Philosophy 2021-12-27T12:19:03Z 2021-12-27T12:19:03Z 2021 Thesis-Doctor of Philosophy Zhang, X. (2021). Low-power circuits for neuromorphic vision sensor based internet of video things. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/154407 https://hdl.handle.net/10356/154407 10.32657/10356/154407 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Low-power circuits for neuromorphic vision sensor based internet of video things

Similar Items