ASIC implementation of a high speed and low power scalar product computation unit
This project involves the design, synthesis and placement & routing of improved 16-bit 15-element unsigned inner product architecture. Improvement to the design were made in the carry free addition stage, which is also known as column compression stage or reduction stage, whereby counters are in...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/16733 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This project involves the design, synthesis and placement & routing of improved 16-bit 15-element unsigned inner product architecture. Improvement to the design were made in the carry free addition stage, which is also known as column compression stage or reduction stage, whereby counters are incorporated to perform the preliminary partial product bit accumulation before summation using adders. This report discusses the entire application-specific integrated circuit implementation process, from RTL coding and functional simulations of the proposed architecture to synthesis and timing verification of the design, and finally the placement and routing of the synthesized design.
The proposed inner product architecture can reduce the resultant height of partial product tree up to 4 times smaller than that of inner product using conventional merged arithmetic approach. Drastic decrease in resultant height leads to significant reduces in total number of adders, and hence reduces the total area. In fact, the design had been estimated to have area saving approximately 45.5% as compared to latest inner product architecture. The design had been functionally verified using several different input test patterns.
The proposed design was then synthesized using STM90nm technology. The synthesized design has latency of two clock cycles with minimum clock period of 5.25ns and thus total delay of 10.5ns. Due to the pipeline manner of the proposed design, it has throughput of 1 clock cycle (5.25ns). The proposed design was then placed and routed. Preliminary timing analysis revealed that the placed and routed design had passed the timing constraint as well as the design constraints, except for max fanout requirement. Furthermore, the die size of the routed design is 1.56mm2 which includes the area of IO pads and special hard macro required for IO pad stability.
The project shows that the proposed inner product architecture has remarkable area reduction and commendable speed performance. Completion of the placement and routing process with detailed timing calculations and power analysis shall ensure the reliability of the 16-bit 15-element counter-based unsigned inner product processor chip. |
---|