Low-power circuits for neuromorphic vision sensor based internet of video things

There has been a tremendous growth in the number of sensors under the paradigm of the Internet of Things (IoT) spurred by the advent of 5G communication. Among such sensors, video cameras hold a special role due to their rich information content. However, due to the huge volume of such data, it requ...

Full description

Saved in:
Bibliographic Details
Main Author: Zhang, Xueyong
Other Authors: Gwee Bah Hwee
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/154407
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:There has been a tremendous growth in the number of sensors under the paradigm of the Internet of Things (IoT) spurred by the advent of 5G communication. Among such sensors, video cameras hold a special role due to their rich information content. However, due to the huge volume of such data, it requires a special paradigm — internet of video things (IoVT) — to process and handle scalability of such networks. This method leverages the huge success of deep learning to use machine learning accelerators at the sensor node to perform “Edge” computing thus reducing the wireless transmission bottleneck. However, von Neumann architectures take unbearable energy and latency costs for deep learning accelerator hardware, because of the separation of data storage device and computing unit. The increasing demands on memory storage capacity and computational capability make it challenging to deal with huge data on resource limited platforms such as portable products and remote sensory devices. In computer vision (CV) applications for traffic surveillance and monitoring, image frames from a camera undergo several processing steps such as image denoising, region proposal, object classification, and object tracking. However, implementation of this data-intensive computing based on traditional von Neumann architecture involves a huge energy dissipation and more tedious execution time due to the enormous data movement between computing unit and storage block. Neuromorphic vision sensors (NVSs) hold promises for such applications due to its ability to reduce data at the source by optimal sampling. NVS only records active event data and ignore the stationary background reducing the redundant data significantly. Recent work has shown a hybrid frame-event approach that processes event based binary images (EBBI) created out of events from a NVS allows for efficient denoising and region proposal (RP) operations. However, no hardware implementation was reported. To overcome the bottleneck of the system performance and implement the integrated circuits for image or video processing in the application of traffic surveillance, near memory computing and in memory computing were proposed to reduce or even remove the data movement between memory device and processor unit. We first proposed a novel Collocated Random Access Memory (CRAM) based analog in memory computing (IMC) architecture to achieve parallel image denoising and image filling. The process of image denoising and filling is essential for NVS because of the inherent random noise (spurious events) due to thermal noise in transistors, shot noise and junction leakage current of the photodiode etc. The proposed approach is tested with the binary image frames from a Dynamic and Active-pixel Vision Sensor (DAVIS) setup and achieves around 10000X lesser energy cost compared to conventional non-IMC approach in the same process (in 65 nm CMOS). The fully parallel natural diffusion architecture reduces the processing time at least to 20 ns and average power consumption to 170 pJ per frame, leveraging large throughput and energy efficiency. The second part of this thesis explores low power region proposal (RP) algorithms and hardware implementations. We propose an edge event driven RP (EEDRP) approach with programmable parameters for the event-based binary image to exploit spatial redundancy in the valid frames. The proposed EEDRP network can quickly find out the bounding boxes of each object in the image to reduce the computation complexity of the successive deep neural network (DNN) by confining the computing region to the proposed bounding boxes instead of the whole image frame. The EEDRP algorithm can realize near-memory computing by reading the image memory and processing locally. By scanning the whole memory array once, the RP computation happens only when the edge event (rising edge or falling edge) is detected. The EEDRP enable us deal with images even with noises and holes and it is also tolerant of fragmented objects because it will merge objects if the distance is less than the configured parameter. All the parameters can be programmable for different application scenarios. The EEDRP algorithm performs a high accuracy, high energy efficient and low latency region proposal than the traditional connected component labeling (CCL) algorithm. Simulated in 65 nm CMOS, this chip produces up to 15 region proposals per frame and achieves ⁓580X energy savings compared to the digitally implemented CCL algorithm and throughput of 2.6 frames/msec at 200 MHz. We also proposed axes projection based RP (APBRP) to further reduce energy and time cost. It achieves ⁓1767X faster than the CCLRP implementation thanks to the parallel in memory computing technique. From measurement results, the in-memory computing based APBRP is ⁓2700X more energy efficient than the near-memory based EEDRP. The weighted F1 scores of both EEDRP and APBRP achieve 2.55X and 1.7X better than the conventional HISTRP and CCLRP, respectively. The image in these regions have to be next classified by neural networks and neuromorphic implementations that utilize analog or physical computing are promising and have been known to be energy efficient compared to digital baselines. Neuro-inspired spiking neural networks (SNN) have also gained popularity due to the promise of sparse activation leading to lower energy dissipation. In recent years, time-based computational circuits for DNN/SNN are gaining popularity due to the reduced power supply in scaled CMOS. An important building block in these designs is a digital delay cell. For example, it is used to create an oscillator that can convert analogue current to digital output (rate based neuron) or be used as an integrate and fire neuron with bio-plausible refractory period and spike frequency adaptation features. This thesis also explored an energy- and area-efficient full differential CMOS current controlled ring oscillator (CCO) as a suitable and compact structure in neural network applications. The neuronal oscillator achieves higher frequency while consuming lower area and lower energy due to less transistors are utilized compared with the conventional structure. By eliminating the unnecessary transistors, the proposed structure is composed of a simplest dynamic positive feedback latch and differential pairs, saving 25% area in size. The CCO can be tuned by both input voltage and external variable resistor. The measurement results show our work achieves 11% frequency improvement and 13% energy-efficient without degrading the jitter and phase noise characteristics. In summary, we presented a set of algorithms and hardware solutions for energy efficient neuromorphic circuits that use near/in-memory computing techniques and time base computing approach. We have demonstrated the testing results and performance in the application of traffic monitoring.