An area and energy efficient inner-product processor for serial-link bus architecture

A unique word-serial inner-product processor architecture is proposed to capitalize on the high-speed serial-link bus. To eliminate the input buffers and deserializers, partial products are generated immediately from the serial input data and accumulated by an array of small binary counters operatin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Meher, Manas Ranjan, Jong, Ching Chuen, Chang, Chip Hong
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2013
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/95905 http://hdl.handle.net/10220/11317
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-95905
record_format	dspace
spelling	sg-ntu-dr.10356-959052020-03-07T14:02:45Z An area and energy efficient inner-product processor for serial-link bus architecture Meher, Manas Ranjan Jong, Ching Chuen Chang, Chip Hong School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering A unique word-serial inner-product processor architecture is proposed to capitalize on the high-speed serial-link bus. To eliminate the input buffers and deserializers, partial products are generated immediately from the serial input data and accumulated by an array of small binary counters operating in parallel to form a reduced partial product matrix directly. The height of the resultant partial product matrix is reduced logarithmically, and hence the carry-save-adder tree needed to complete the inner-product computation is smaller and faster. The small binary counters act as active on-chip buffers to mitigate the workload of the partial product accumulator. Their ability to accumulate partial product bits faster than combinatorial full adder leads to a simple two-stage architecture of high throughput and low latency. The architecture consumes 46% less silicon area, 24% less energy per inner-product computation and 70% less total interconnect length than its merged arithmetic counterpart in 65 nm CMOS process. In addition, the architecture requires only 4 metal layers out of available 7 layers for signal and power routing. By emulating the on-chip serial-link bus architecture on both designs, it is demonstrated that the proposed design is most suited for high-speed on-chip serial-link bus architecture. 2013-07-12T04:47:12Z 2019-12-06T19:23:13Z 2013-07-12T04:47:12Z 2019-12-06T19:23:13Z 2012 2012 Journal Article Meher, M. R., Jong, C. C., & Chang, C. H. (2012). An area and energy efficient inner-product processor for serial-link bus architecture. IEEE Transactions on Circuits and Systems I: Regular Papers, 59(12), 2945-2955. 1549-8328 https://hdl.handle.net/10356/95905 http://hdl.handle.net/10220/11317 10.1109/TCSI.2012.2220471 en IEEE transactions on circuits and systems I : regular papers © 2012 IEEE.
institution	Nanyang Technological University
building	NTU Library
country	Singapore
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Meher, Manas Ranjan Jong, Ching Chuen Chang, Chip Hong An area and energy efficient inner-product processor for serial-link bus architecture
description	A unique word-serial inner-product processor architecture is proposed to capitalize on the high-speed serial-link bus. To eliminate the input buffers and deserializers, partial products are generated immediately from the serial input data and accumulated by an array of small binary counters operating in parallel to form a reduced partial product matrix directly. The height of the resultant partial product matrix is reduced logarithmically, and hence the carry-save-adder tree needed to complete the inner-product computation is smaller and faster. The small binary counters act as active on-chip buffers to mitigate the workload of the partial product accumulator. Their ability to accumulate partial product bits faster than combinatorial full adder leads to a simple two-stage architecture of high throughput and low latency. The architecture consumes 46% less silicon area, 24% less energy per inner-product computation and 70% less total interconnect length than its merged arithmetic counterpart in 65 nm CMOS process. In addition, the architecture requires only 4 metal layers out of available 7 layers for signal and power routing. By emulating the on-chip serial-link bus architecture on both designs, it is demonstrated that the proposed design is most suited for high-speed on-chip serial-link bus architecture.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Meher, Manas Ranjan Jong, Ching Chuen Chang, Chip Hong
format	Article
author	Meher, Manas Ranjan Jong, Ching Chuen Chang, Chip Hong
author_sort	Meher, Manas Ranjan
title	An area and energy efficient inner-product processor for serial-link bus architecture
title_short	An area and energy efficient inner-product processor for serial-link bus architecture
title_full	An area and energy efficient inner-product processor for serial-link bus architecture
title_fullStr	An area and energy efficient inner-product processor for serial-link bus architecture
title_full_unstemmed	An area and energy efficient inner-product processor for serial-link bus architecture
title_sort	area and energy efficient inner-product processor for serial-link bus architecture
publishDate	2013
url	https://hdl.handle.net/10356/95905 http://hdl.handle.net/10220/11317
_version_	1681044670869667840

An area and energy efficient inner-product processor for serial-link bus architecture

Similar Items