A fault-tolerant design strategy utilizing approximate computing

This paper presents a novel Fault-tolerant design i.e., redundancy strategy based on Approximate Computing, which we call FAC. Conventionally, triple modular redundancy (TMR) has been widely used to guarantee 100% tolerance to any single fault or failure of a processing unit where the processing uni...

Full description

Saved in:
Bibliographic Details
Main Authors: Balasubramanian, Padmanabhan, Maskell, Douglas L.
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/170489
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-170489
record_format dspace
spelling sg-ntu-dr.10356-1704892023-09-22T15:35:21Z A fault-tolerant design strategy utilizing approximate computing Balasubramanian, Padmanabhan Maskell, Douglas L. School of Computer Science and Engineering 2023 IEEE Region 10 Symposium (TENSYMP) Hardware & Embedded Systems Lab (HESL) Engineering::Electrical and electronic engineering::Integrated circuits Engineering::Computer science and engineering::Hardware Fault Tolerance Redundancy Approximate Computing Arithmetic Circuits This paper presents a novel Fault-tolerant design i.e., redundancy strategy based on Approximate Computing, which we call FAC. Conventionally, triple modular redundancy (TMR) has been widely used to guarantee 100% tolerance to any single fault or failure of a processing unit where the processing unit may be a circuit or system. However, TMR results in more than 200% overhead in area and power compared to a single processing unit. To reduce the overheads in design metrics associated with TMR, alternative redundancy approaches were presented in the literature but they guarantee only partial or moderate fault tolerance. Nevertheless, among these alternative redundancy approaches, the majority voter-based reduced precision redundancy (MVRPR) may be useful for naturally error-resilient applications like digital signal processing which is commonly used in space systems. The proposed FAC is ideally suited for error-resilient applications but unlike MVRPR which guarantees only a moderate fault tolerance, FAC guarantees a 100% tolerance to any single fault or failure of a processing unit like TMR. We considered TMR, MVRPR, and FAC to comparatively evaluate their performance for a digital image processing application. The image processing results obtained demonstrate the usefulness of FAC. Further, for a physical implementation using a 28-nm CMOS technology, FAC achieves a 15.3% reduction in delay, 19.5% reduction in area, and a 24.7% reduction in power compared to TMR, and an 18% reduction in delay, 5.4% reduction in area, and 11.2% reduction in power compared to MVRPR. Ministry of Education (MOE) Submitted/Accepted version This research was partially funded by the Singapore Ministry of Education (MOE), Academic Research Fund under grant numbers Tier-1 RG48/21 and Tier-1 RG127/22. 2023-09-19T01:39:00Z 2023-09-19T01:39:00Z 2023 Conference Paper Balasubramanian, P. & Maskell, D. L. (2023). A fault-tolerant design strategy utilizing approximate computing. 2023 IEEE Region 10 Symposium (TENSYMP). https://dx.doi.org/10.1109/TENSYMP55890.2023.10223663 978-1-6654-8258-5 https://hdl.handle.net/10356/170489 10.1109/TENSYMP55890.2023.10223663 en RG48/21 RG127/22 © 2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TENSYMP55890.2023.10223663. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering::Integrated circuits
Engineering::Computer science and engineering::Hardware
Fault Tolerance
Redundancy
Approximate Computing
Arithmetic Circuits
spellingShingle Engineering::Electrical and electronic engineering::Integrated circuits
Engineering::Computer science and engineering::Hardware
Fault Tolerance
Redundancy
Approximate Computing
Arithmetic Circuits
Balasubramanian, Padmanabhan
Maskell, Douglas L.
A fault-tolerant design strategy utilizing approximate computing
description This paper presents a novel Fault-tolerant design i.e., redundancy strategy based on Approximate Computing, which we call FAC. Conventionally, triple modular redundancy (TMR) has been widely used to guarantee 100% tolerance to any single fault or failure of a processing unit where the processing unit may be a circuit or system. However, TMR results in more than 200% overhead in area and power compared to a single processing unit. To reduce the overheads in design metrics associated with TMR, alternative redundancy approaches were presented in the literature but they guarantee only partial or moderate fault tolerance. Nevertheless, among these alternative redundancy approaches, the majority voter-based reduced precision redundancy (MVRPR) may be useful for naturally error-resilient applications like digital signal processing which is commonly used in space systems. The proposed FAC is ideally suited for error-resilient applications but unlike MVRPR which guarantees only a moderate fault tolerance, FAC guarantees a 100% tolerance to any single fault or failure of a processing unit like TMR. We considered TMR, MVRPR, and FAC to comparatively evaluate their performance for a digital image processing application. The image processing results obtained demonstrate the usefulness of FAC. Further, for a physical implementation using a 28-nm CMOS technology, FAC achieves a 15.3% reduction in delay, 19.5% reduction in area, and a 24.7% reduction in power compared to TMR, and an 18% reduction in delay, 5.4% reduction in area, and 11.2% reduction in power compared to MVRPR.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Balasubramanian, Padmanabhan
Maskell, Douglas L.
format Conference or Workshop Item
author Balasubramanian, Padmanabhan
Maskell, Douglas L.
author_sort Balasubramanian, Padmanabhan
title A fault-tolerant design strategy utilizing approximate computing
title_short A fault-tolerant design strategy utilizing approximate computing
title_full A fault-tolerant design strategy utilizing approximate computing
title_fullStr A fault-tolerant design strategy utilizing approximate computing
title_full_unstemmed A fault-tolerant design strategy utilizing approximate computing
title_sort fault-tolerant design strategy utilizing approximate computing
publishDate 2023
url https://hdl.handle.net/10356/170489
_version_ 1779156677745967104