High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations

Guided image filtering has been applied widely for increasing demand of high performance filtering, especially for real-time image/video processing. Gradient guided filter improves the filtering quality, reducing the halo-artifacts problem due to its edge-aware characteristics. However, the gradient...

Full description

Saved in:

Bibliographic Details
Main Author:	Wu, Lei
Other Authors:	Jong Ching Chuen
Format:	Theses and Dissertations
Language:	English
Published:	2018
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	http://hdl.handle.net/10356/74100
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-74100
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Wu, Lei High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations
description	Guided image filtering has been applied widely for increasing demand of high performance filtering, especially for real-time image/video processing. Gradient guided filter improves the filtering quality, reducing the halo-artifacts problem due to its edge-aware characteristics. However, the gradient guided filter algorithm has high computation complexity and the computation involves global pixels, which hinder its VLSI implementation for real-time full-HD application. This work addresses these issues and a VLSI architecture is proposed for the gradient guided image filter. Several design techniques are developed and used in the design to achieve high computation speed and high throughput. A seamless dataflow is proposed for the complete system that consists of three main processing stages, specifically the preprocessing stage, the linear coefficient computation stage and the output stage. The preprocessing stage applies a down-sampling technique with a large sampling rate to reduce computation cost in terms of circuit size, processing time and power consumption. The global parameter values are quickly derived with reasonable good global information maintained so that the quality of the filtering results are not sacrificed when these values are applied in the subsequent two stages. The linear coefficient computation stage contains the most complex computations such as square root, division and exponential function. Down-sampling technique is applied with a sampling rate lower than in the preprocessing stage so as to balance the computation cost and filtering accuracy. In addition, the intensive arithmetic computation modules that dominate the critical path delay are redesigned by using adequate approximated operations. Specifically, novel non-iterative dividers are developed to replace original dividers for reducing delays in the critical paths. With the proposed non-iterative division,the quotient of the division is modeled as a normalized curved surface. The curved surface is partitioned into small regions and approximated by smaller planes for efficient hardware implementation. Curve fitting method and mixed integer linear programming method are adopted and evaluated for local optimization of the approximation errors. In this way, the dividers are implemented with only simple arithmetic operations and a small look-up table. As a result, the operation is fast and the approximation errors are optimized while satisfying the accuracy requirement. The other intensive computation, the exponentiation function, is also simplified by piecewise linear approximation, and implemented with only shifters and adder trees. As such, the approximated computations are used to improve the computation performance and simplify the designs of the complex arithmetic modules. The output stage employs parallel processing and operates at a frequency 16 times higher to restore the filtering results to its original full frame size. The linear coefficients obtained from the downsampling stage are applied concurrently to all the 16 pixels in the window that the coefficients are computed for. As a result, not only the original image size is restored, the quality of the filtering results is maintained and high throughput is achieved at the same time. Based on the STM 65nm CMOS technology, the implementation result shows the proposed VLSI architecture for the gradient guided image filter is able to support Full-HD image filtering at a throughput above 150 frames per second, achieving high throughput, small size and low power consumption comparing to the existing VLSI design for the original guided image filtering. This thesis describes the proposed VLSI architecture, the details of its design, the designs of the non-iterative dividers and the design of the piecewise linear approximated exponential function. The qualitative and quantitative analysis of the impact of the approximations on the filtering results and the performance comparisons of the proposed designs and the existing designs are included.
author2	Jong Ching Chuen
author_facet	Jong Ching Chuen Wu, Lei
format	Theses and Dissertations
author	Wu, Lei
author_sort	Wu, Lei
title	High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations
title_short	High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations
title_full	High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations
title_fullStr	High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations
title_full_unstemmed	High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations
title_sort	high throughput vlsi architecture for gradient guided filter with approximated arithmetic operations
publishDate	2018
url	http://hdl.handle.net/10356/74100
_version_	1772827044927766528
spelling	sg-ntu-dr.10356-741002023-07-04T17:24:59Z High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations Wu, Lei Jong Ching Chuen School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering Guided image filtering has been applied widely for increasing demand of high performance filtering, especially for real-time image/video processing. Gradient guided filter improves the filtering quality, reducing the halo-artifacts problem due to its edge-aware characteristics. However, the gradient guided filter algorithm has high computation complexity and the computation involves global pixels, which hinder its VLSI implementation for real-time full-HD application. This work addresses these issues and a VLSI architecture is proposed for the gradient guided image filter. Several design techniques are developed and used in the design to achieve high computation speed and high throughput. A seamless dataflow is proposed for the complete system that consists of three main processing stages, specifically the preprocessing stage, the linear coefficient computation stage and the output stage. The preprocessing stage applies a down-sampling technique with a large sampling rate to reduce computation cost in terms of circuit size, processing time and power consumption. The global parameter values are quickly derived with reasonable good global information maintained so that the quality of the filtering results are not sacrificed when these values are applied in the subsequent two stages. The linear coefficient computation stage contains the most complex computations such as square root, division and exponential function. Down-sampling technique is applied with a sampling rate lower than in the preprocessing stage so as to balance the computation cost and filtering accuracy. In addition, the intensive arithmetic computation modules that dominate the critical path delay are redesigned by using adequate approximated operations. Specifically, novel non-iterative dividers are developed to replace original dividers for reducing delays in the critical paths. With the proposed non-iterative division,the quotient of the division is modeled as a normalized curved surface. The curved surface is partitioned into small regions and approximated by smaller planes for efficient hardware implementation. Curve fitting method and mixed integer linear programming method are adopted and evaluated for local optimization of the approximation errors. In this way, the dividers are implemented with only simple arithmetic operations and a small look-up table. As a result, the operation is fast and the approximation errors are optimized while satisfying the accuracy requirement. The other intensive computation, the exponentiation function, is also simplified by piecewise linear approximation, and implemented with only shifters and adder trees. As such, the approximated computations are used to improve the computation performance and simplify the designs of the complex arithmetic modules. The output stage employs parallel processing and operates at a frequency 16 times higher to restore the filtering results to its original full frame size. The linear coefficients obtained from the downsampling stage are applied concurrently to all the 16 pixels in the window that the coefficients are computed for. As a result, not only the original image size is restored, the quality of the filtering results is maintained and high throughput is achieved at the same time. Based on the STM 65nm CMOS technology, the implementation result shows the proposed VLSI architecture for the gradient guided image filter is able to support Full-HD image filtering at a throughput above 150 frames per second, achieving high throughput, small size and low power consumption comparing to the existing VLSI design for the original guided image filtering. This thesis describes the proposed VLSI architecture, the details of its design, the designs of the non-iterative dividers and the design of the piecewise linear approximated exponential function. The qualitative and quantitative analysis of the impact of the approximations on the filtering results and the performance comparisons of the proposed designs and the existing designs are included. Doctor of Philosophy (EEE) 2018-04-25T01:32:57Z 2018-04-25T01:32:57Z 2018 Thesis Wu, L. (2018). High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/74100 10.32657/10356/74100 en 128 p. application/pdf

High throughput VLSI architecture for gradient guided filter with approximated arithmetic operations

Similar Items