Constraint-aware configurable system-on-chip design for embedded computing

Field Programmable Gate Arrays (FPGAs) are rapidly becoming a popular alternative to ASICs as they continue to increase in capacity, functionality and performance. At the same time, FPGA developers are faced with the challenges of meeting increasingly aggressive design constraints such as power, del...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلف الرئيسي:	Alok Prakash
مؤلفون آخرون:	Thambipillai Srikanthan
التنسيق:	Theses and Dissertations
اللغة:	English
منشور في:	2014
الموضوعات:	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/61678
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة:	Nanyang Technological University
اللغة:	English

id	sg-ntu-dr.10356-61678
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures Alok Prakash Constraint-aware configurable system-on-chip design for embedded computing
description	Field Programmable Gate Arrays (FPGAs) are rapidly becoming a popular alternative to ASICs as they continue to increase in capacity, functionality and performance. At the same time, FPGA developers are faced with the challenges of meeting increasingly aggressive design constraints such as power, delay and area costs without violating shorter Time-to-Market (TTM) pressures and lower Non-Recurring Engineering (NRE) costs for embedded systems development. In this research, efficient techniques have been proposed for processor sub-setting and customization as well as the rapid generation of application-specific hardware accelerators in order to meet the design constraints of configurable System-on-Chip (SoC) platforms. A processor-agnostic technique has been devised for sub-setting soft-core processors by relying on LLVM compiler generated front-end application output. The proposed approach has resulted in a systematic method for the application-aware sub-setting of the micro-architecture subsystems such as hardware multipliers and floating point units of a soft-core processor. Evaluations based on widely used benchmarks show that the proposed method can be deployed to reliably subset soft core processors at high-speed without compromising compute performance. A technique for the architecture-aware enumeration of custom instructions has been proposed next to identify area-efficient custom instructions by employing FPGA resource-aware pruning of the search space. Experimental results based on applications from widely-used benchmark suites confirm that deploying custom instructions identified in this way can improve compute performance by up to 65%. The instruction level parallelism (ILP) has also been exploited to further improve the compute performance by identifying profitable coarse-grained custom instructions. It has been demonstrated that the custom instructions using the proposed method can accelerate computations by up to 39% when compared to a base processor only implementation. Unlike traditional custom instruction generation methods that are incapable of incorporating memory-dependent basic blocks, a novel technique for accelerating memory-dependent basic blocks has been proposed. A detailed data dependency analysis based on pre-defined memory allocation in an application has been developed to guarantee the identification of profitable basic blocks for hardware acceleration. The profitability of a code segment for hardware acceleration is determined by using a mathematical model to represent the overheads associated with the placement of data in both the local and main memory subsystems. The proposed approach for hardware acceleration eliminates the need for Direct Memory Access (DMA) transfers or cache-coherence protocols. A scalable technique for the automatic selection of profitable basic blocks for hardware acceleration has been devised in order to overcome the time complexity of the search space. It relies on a heuristic approach to significantly reduce the search space, thereby resulting in a high-speed technique to recommend the profitable basic blocks with mutual data-dependency for hardware acceleration. It has been shown that, while the runtime and complexity of an exhaustive selection approach increases exponentially, the proposed heuristic search-based method scales almost linearly with increasing number of candidate hardware blocks. Moreover, extensive tests confirm that the compute performance of the hardware accelerators identified with the proposed heuristic method often matches those achieved using exhaustive search results, with deviations of mere 5 to 10% at the very most. It is noteworthy that the proposed heuristic search algorithm is independent of memory subsystem configurations. Finally, the design methodologies presented in this thesis can be integrated to realize a systematic framework for the automatic generation of area-time efficient configurable SoC by leveraging on processor sub-setting, architecture-aware custom instruction generation and data-dependency-aware hardware acceleration.
author2	Thambipillai Srikanthan
author_facet	Thambipillai Srikanthan Alok Prakash
format	Theses and Dissertations
author	Alok Prakash
author_sort	Alok Prakash
title	Constraint-aware configurable system-on-chip design for embedded computing
title_short	Constraint-aware configurable system-on-chip design for embedded computing
title_full	Constraint-aware configurable system-on-chip design for embedded computing
title_fullStr	Constraint-aware configurable system-on-chip design for embedded computing
title_full_unstemmed	Constraint-aware configurable system-on-chip design for embedded computing
title_sort	constraint-aware configurable system-on-chip design for embedded computing
publishDate	2014
url	https://hdl.handle.net/10356/61678
_version_	1759855684919230464
spelling	sg-ntu-dr.10356-616782023-03-04T00:42:09Z Constraint-aware configurable system-on-chip design for embedded computing Alok Prakash Thambipillai Srikanthan School of Computer Engineering Centre for High Performance Embedded Systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Special-purpose and application-based systems DRNTU::Engineering::Computer science and engineering::Computer systems organization::Processor architectures Field Programmable Gate Arrays (FPGAs) are rapidly becoming a popular alternative to ASICs as they continue to increase in capacity, functionality and performance. At the same time, FPGA developers are faced with the challenges of meeting increasingly aggressive design constraints such as power, delay and area costs without violating shorter Time-to-Market (TTM) pressures and lower Non-Recurring Engineering (NRE) costs for embedded systems development. In this research, efficient techniques have been proposed for processor sub-setting and customization as well as the rapid generation of application-specific hardware accelerators in order to meet the design constraints of configurable System-on-Chip (SoC) platforms. A processor-agnostic technique has been devised for sub-setting soft-core processors by relying on LLVM compiler generated front-end application output. The proposed approach has resulted in a systematic method for the application-aware sub-setting of the micro-architecture subsystems such as hardware multipliers and floating point units of a soft-core processor. Evaluations based on widely used benchmarks show that the proposed method can be deployed to reliably subset soft core processors at high-speed without compromising compute performance. A technique for the architecture-aware enumeration of custom instructions has been proposed next to identify area-efficient custom instructions by employing FPGA resource-aware pruning of the search space. Experimental results based on applications from widely-used benchmark suites confirm that deploying custom instructions identified in this way can improve compute performance by up to 65%. The instruction level parallelism (ILP) has also been exploited to further improve the compute performance by identifying profitable coarse-grained custom instructions. It has been demonstrated that the custom instructions using the proposed method can accelerate computations by up to 39% when compared to a base processor only implementation. Unlike traditional custom instruction generation methods that are incapable of incorporating memory-dependent basic blocks, a novel technique for accelerating memory-dependent basic blocks has been proposed. A detailed data dependency analysis based on pre-defined memory allocation in an application has been developed to guarantee the identification of profitable basic blocks for hardware acceleration. The profitability of a code segment for hardware acceleration is determined by using a mathematical model to represent the overheads associated with the placement of data in both the local and main memory subsystems. The proposed approach for hardware acceleration eliminates the need for Direct Memory Access (DMA) transfers or cache-coherence protocols. A scalable technique for the automatic selection of profitable basic blocks for hardware acceleration has been devised in order to overcome the time complexity of the search space. It relies on a heuristic approach to significantly reduce the search space, thereby resulting in a high-speed technique to recommend the profitable basic blocks with mutual data-dependency for hardware acceleration. It has been shown that, while the runtime and complexity of an exhaustive selection approach increases exponentially, the proposed heuristic search-based method scales almost linearly with increasing number of candidate hardware blocks. Moreover, extensive tests confirm that the compute performance of the hardware accelerators identified with the proposed heuristic method often matches those achieved using exhaustive search results, with deviations of mere 5 to 10% at the very most. It is noteworthy that the proposed heuristic search algorithm is independent of memory subsystem configurations. Finally, the design methodologies presented in this thesis can be integrated to realize a systematic framework for the automatic generation of area-time efficient configurable SoC by leveraging on processor sub-setting, architecture-aware custom instruction generation and data-dependency-aware hardware acceleration. DOCTOR OF PHILOSOPHY (SCE) 2014-08-12T01:23:26Z 2014-08-12T01:23:26Z 2014 2014 Thesis Alok Prakash. (2014). Constraint-aware configurable system-on-chip design for embedded computing. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/61678 10.32657/10356/61678 en 236 p. application/pdf

Constraint-aware configurable system-on-chip design for embedded computing

مواد مشابهة