Racetrack memory based logic design for in-memory computing

In-memory computing has been demonstrated to be an efficient computing infrastructure in the big data era for many applications such as graph processing and encryption. The area and power overhead of CMOS technology based memory design is growing rapidly because of the increasing data capacity and l...

Full description

Saved in:
Bibliographic Details
Main Author: Luo, Tao
Other Authors: Douglas Leslie Maskell
Format: Theses and Dissertations
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/73359
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In-memory computing has been demonstrated to be an efficient computing infrastructure in the big data era for many applications such as graph processing and encryption. The area and power overhead of CMOS technology based memory design is growing rapidly because of the increasing data capacity and leakage power along with the shrinking technology node. Thus, a newly introduced emerging memory technology, racetrack memory, is proposed to increase the data capacity and power efficiency of modern memory systems. As the design requirements of the conventional logic are different from that of the emerging memory based logic for in-memory computing, the conventional well-developed CMOS technology based logic designs are less relevant to the emerging memory based in-memory computing. Therefore, novel logic designs for racetrack memory are required. Traditional logic design with separate chips is focusing on high speed, which causes large area and power consumption. Implementing efficient logic design for in-memory computing is challenging due to the demanding requirement for area and power. Firstly, as the computing logic for in-memory computing is built in memory, the available area budget is limited, otherwise the data density of the memory system would be affected. Secondly, due to the thermal constraint of the memory chip, the available energy budget for computing logic design is limited. Large energy consumption may cause malfunction and even permanent damage to the memory chip because of high temperature. Finally, the adoption of emerging memory technologies makes the logic design more challenging due to their unique characteristics such as the sequential access mechanism of racetrack memory. This thesis addresses the above challenges in racetrack memory based in-memory logic design as follows. First, for general computing operations, we first propose racetrack memory based half and full adders The proposed magnetic full adder is implemented with pre-charged sense amplifiers (PCSA) and magnetic tunnel junctions (MTJ). By reusing parts of the logic design, the magnetic full adder significantly improves the area and energy efficiency compared with CMOS-based full adder and the state-of-the-art magnetic full adder. Second, based on the proposed magnetic full adder, we propose a pipelined Booth multiplier by exploring the inherent sequential access mechanism of racetrack memory, which achieves high area and energy efficiency. In order to increase the throughput of proposed Booth multiplier, we further parallelize the generation and addition of the partial products of the proposed Booth multiplier. Unlike the area- and energy-consuming adder array architecture in conventional CMOS technology based designs, the proposed multiplier utilizes a weight-based parallel architecture. In order to ensure the high energy efficiency, we propose an optimization that transforms the energy-demanding write operations to shift operations. With this optimization, the weight-based parallel multiplier achieves high throughput while maintaining high area and energy efficiency. Third, for specific applications, we propose an efficient racetrack memory based design to accelerate modular multiplication. Modular multiplication is widely used in various applications such as cryptography, number theory, group theory, ring theory, knot theory, abstract algebra, computer algebra, computer science, chemistry and the visual and musical arts. In order to implement modular multiplication efficiently, a novel two-stage scalable modular multiplication algorithm is proposed to significantly reduce the delay. An efficient architecture based on racetrack memory is further developed to reduce the number of required adders. Racetrack memory based application specific design for modular multiplication shows significant improvement compared with the state-of-the-art CMOS technology based implementation in area, energy, and performance. Overall, this thesis has made contributions to address the challenges in racetrack memory based in-memory logic design, and we demonstrate significant improvements in terms of area overhead and energy consumption in comparison with the state-of-the-art CMOS technology based logic design.