Hardware acceleration of neural networks with CMOS and post-CMOS devices

There is a huge need for embedded machine learning for portable devices and smart sensors to power the next generation of Internet of Things (IoT). Implementation of neural networks involve large number of arithmetic and memory operations. Realization of the arithmetic blocks with conventional digit...

Full description

Saved in:
Bibliographic Details
Main Author: Govind Narasimman
Other Authors: Arindam Basu
Format: Theses and Dissertations
Language:English
Published: 2017
Subjects:
Online Access:http://hdl.handle.net/10356/72479
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:There is a huge need for embedded machine learning for portable devices and smart sensors to power the next generation of Internet of Things (IoT). Implementation of neural networks involve large number of arithmetic and memory operations. Realization of the arithmetic blocks with conventional digital circuits will inherit a trade-off between accuracy of calculation and area required. On the other hand, existing analog building blocks for neural networks suffer from inaccuracies related to process variation and large power consumption. Moreover, the efficiency of computation-memory interface is degrading since memory bandwidth increment is poor compared to computation throughput growth in CMOS technology. Though the recent large scale neuromorphic circuits have used localized random access memory for reducing memory operations, this local memory size will not be scalable with increasing size of datasets. Here, we explore novel CMOS and post-CMOS circuits to realize ultra-low power neuromorphic circuit by co-design of algorithm and hardware. The solutions also overcome the issue of increasing bandwidth gap between memory operations and computation. First, a deep neural network with 2 convolution layers and 2 fully connected layers, is chosen and tuned for hardware implementation. The network has few tunable parameters (_ 40000) and is 40 times faster in training than usual deep neural networks with 4 layers. We propose a compact, single transistor element for realizing the connections inside the neural network- called ’synapse’. These synapses perform the computations involved by virtue of mismatch inherent in their fabrication. We use a current mirror array with n-input lines and m output lines, to perform ’n x m’ Multiply and Accumulation operation. The resultant neuromorphic circuit can emulate multi-layered artificial vision system. A circuit fabricated in 0.35_m CMOS process is characterized and a behavioral model is simulated for the deep neural network. Here, the learning is done offline. The inputs to the network may vary according to environmental conditions, for which we need an adaptive neural network. Hence we propose a second solution where the neuromorphic circuit can adapt it’s parameters in real time. With the advent of novel nanoscale devices with physical properties well matched to neural network enabling computations at energies much lower than CMOS, the research also focuses on use of a post-CMOS spintronic device-a Domain-Wall magnet for obtaining a low power spike timing dependent plastic (STDP) synapse for online learning. The spin-mode signals are injected across small potential (_ 50mV) through multiple layers of ferromagnetic and non-magnetic layers. Here we discuss the implementation of a spiking neural network with synapses which can be trained according to STDP learning rule. A detailed study with the help of device circuit co-simulation is done. Possible use of this synapse in online, real time learning spiking neural networks is also illustrated in this thesis.