A memory-efficient scalable architecture for lifting-based discrete wavelet transform

In this brief, we propose a new parallel lifting-based 2-D DWT architecture with high memory efficiency and short critical path. The memory efficiency is achieved with a novel scanning method that enables tradeoff of external memory bandwidth and on-chip memory. Based on the data flow graph of the f...

Full description

Saved in:
Bibliographic Details
Main Authors: Hu, Yusong, Jong, Ching Chuen
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2013
Subjects:
Online Access:https://hdl.handle.net/10356/102401
http://hdl.handle.net/10220/16815
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In this brief, we propose a new parallel lifting-based 2-D DWT architecture with high memory efficiency and short critical path. The memory efficiency is achieved with a novel scanning method that enables tradeoff of external memory bandwidth and on-chip memory. Based on the data flow graph of the flipped lifting algorithm, processing units (PUs) are developed for maximally utilizing the inherent parallelism. With S number of PUs, the throughput can be scaled while keeping the latency constant. Compared with the best existing architecture, the proposed architecture requires less memory. For an N × N image, the proposed architecture consumes a total of only 3N + 24S words of transposition memory, temporal memory, and pipeline registers. The synthesized results in a 90-nm CMOS process show that it achieves better area-delay products than the best existing design by 32.3%, 31.5%, and 27.0% when S = 2, 4, and 8, respectively, and by 26%, 26%, and 22% when the overhead for buffering the required overlapped pixels is taken into account.