Modelling, exploration and optimization of hardware accelerators for deep learning applications

Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing...

Full description

Saved in:

Bibliographic Details
Main Author:	Dutt, Arko
Other Authors:	Mohamed M. Sabry Aly
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling
Online Access:	https://hdl.handle.net/10356/164987
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-164987
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling
spellingShingle	Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling Dutt, Arko Modelling, exploration and optimization of hardware accelerators for deep learning applications
description	Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing (like General Purpose Graphics Processing Unit) and new custom hardware (like Tensor Processing Unit) are proposed to meet the maximum performance needs. These computing systems are still bottle-necked by under-achieved power-efficiency due to excessive data transfers. Though a lot of new computing architectures are emerging in academia and industry targeting efficient processing of application workloads, it takes considerable amount of time to decide on the most efficient hardware solution. Additionally, computer architects cannot guarantee which hardware is the best in terms of maximum performance with least energy consumption. An efficient computing system will require co-optimizations at device-, circuit, architecture- and system-levels. A toolchain or a unified tool that can instantaneously and accurately simulate the hardware costs targeting an application workload, can therefore accelerate the design and optimization of efficient computing systems much before its actual realization. In this dissertation, we introduced and presented two mechanisms to accelerate the estimation or simulation of the cost metrics of hardware accelerators targeting deep learning workloads, such energy consumption, performance, energy-delay-product (EDP) and so on. Deep learning has achieved popularity due to its improved prediction accuracy, and it finds widespread use in large data processing applications. First, we used deep neural networks to quicken the estimations of execution time, energy consumption and area of neural network accelerators by 10^6× against baseline cycle-accurate simulations. We call this mechanism EASTDNN (Expediting Architectural SimulaTions using Deep Neural Networks), which achieves high accuracy against the baseline. A major disadvantage of EAST-DNN is the time and cost needed to collect data for training the deep neural network. Thus, an efficient technique avoiding costly training data collection is essential. Second, we formulated closed-form analytical representations to further accelerate the estimations of hardware costs of deep learning accelerators without needing DNN training overhead, while achieving accuracy comparable to or better than EAST-DNN. We call it Pearl, an acronym used to represent the approach— Towards Optimization of DNN-accelerator s using closed-form analytical representation). In addition to high accuracy and speedup compared to state-of-the-art, Pearl also provides enough flexibility to explore a lot of parameters for a particular accelerator architecture template, and it can be extended to model several other architecture templates and dataflow mapping for efficient deep learning acceleration. Pearl formulation, in general, is independent of device technology and accelerator architecture. Third, we used the faster and more-accurate simulator based on the analytical models to explore and form optimization problems, as an application of Pearl in the search of efficient (and emerging) deep learning systems. We presented several case-studies using analytical models for DNN-accelerator optimization, while imposing user-defined constraints. Area-constrained and energy-delay pareto optimization of DNN-accelerators is presented as case-studies. Cases show efficient choice of memory and compute resources through accelerated Pearl-based simulations. Huge memory accesses imposed by large DNN workloads increase the overall energy consumption. A case where emerging memory improves energy consumption of a DNN accelerator configuration is obtained through these exploration methods. Fourth, we built emerging memory with resistive-random access memory (RRAM) and thin-film transistor (TFT) devices, acting as off-chip main memory for emerging multi-tier monolithic 3D system design. We used Pearl-based simulations to quantify the system-level benefits of an emerging computing system composing newer devices, with dataflow accelerators supported by Pearl, targeting deep learning inference. Material presented in this thesis paves the way for enabling ultra-scale exploration and optimization of domain-specific accelerators for deep learning inference in a short time. Using these approaches, the quality of any new hardware accelerator (new device technology, architecture, memory, or dataflow mapping) can be quickly evaluated and optimized while considering a specific deep learning application.
author2	Mohamed M. Sabry Aly
author_facet	Mohamed M. Sabry Aly Dutt, Arko
format	Thesis-Doctor of Philosophy
author	Dutt, Arko
author_sort	Dutt, Arko
title	Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_short	Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_full	Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_fullStr	Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_full_unstemmed	Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_sort	modelling, exploration and optimization of hardware accelerators for deep learning applications
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/164987
_version_	1764208162579152896
spelling	sg-ntu-dr.10356-1649872023-04-04T02:58:00Z Modelling, exploration and optimization of hardware accelerators for deep learning applications Dutt, Arko Mohamed M. Sabry Aly School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) msabry@ntu.edu.sg Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing (like General Purpose Graphics Processing Unit) and new custom hardware (like Tensor Processing Unit) are proposed to meet the maximum performance needs. These computing systems are still bottle-necked by under-achieved power-efficiency due to excessive data transfers. Though a lot of new computing architectures are emerging in academia and industry targeting efficient processing of application workloads, it takes considerable amount of time to decide on the most efficient hardware solution. Additionally, computer architects cannot guarantee which hardware is the best in terms of maximum performance with least energy consumption. An efficient computing system will require co-optimizations at device-, circuit, architecture- and system-levels. A toolchain or a unified tool that can instantaneously and accurately simulate the hardware costs targeting an application workload, can therefore accelerate the design and optimization of efficient computing systems much before its actual realization. In this dissertation, we introduced and presented two mechanisms to accelerate the estimation or simulation of the cost metrics of hardware accelerators targeting deep learning workloads, such energy consumption, performance, energy-delay-product (EDP) and so on. Deep learning has achieved popularity due to its improved prediction accuracy, and it finds widespread use in large data processing applications. First, we used deep neural networks to quicken the estimations of execution time, energy consumption and area of neural network accelerators by 10^6× against baseline cycle-accurate simulations. We call this mechanism EASTDNN (Expediting Architectural SimulaTions using Deep Neural Networks), which achieves high accuracy against the baseline. A major disadvantage of EAST-DNN is the time and cost needed to collect data for training the deep neural network. Thus, an efficient technique avoiding costly training data collection is essential. Second, we formulated closed-form analytical representations to further accelerate the estimations of hardware costs of deep learning accelerators without needing DNN training overhead, while achieving accuracy comparable to or better than EAST-DNN. We call it Pearl, an acronym used to represent the approach— Towards Optimization of DNN-accelerator s using closed-form analytical representation). In addition to high accuracy and speedup compared to state-of-the-art, Pearl also provides enough flexibility to explore a lot of parameters for a particular accelerator architecture template, and it can be extended to model several other architecture templates and dataflow mapping for efficient deep learning acceleration. Pearl formulation, in general, is independent of device technology and accelerator architecture. Third, we used the faster and more-accurate simulator based on the analytical models to explore and form optimization problems, as an application of Pearl in the search of efficient (and emerging) deep learning systems. We presented several case-studies using analytical models for DNN-accelerator optimization, while imposing user-defined constraints. Area-constrained and energy-delay pareto optimization of DNN-accelerators is presented as case-studies. Cases show efficient choice of memory and compute resources through accelerated Pearl-based simulations. Huge memory accesses imposed by large DNN workloads increase the overall energy consumption. A case where emerging memory improves energy consumption of a DNN accelerator configuration is obtained through these exploration methods. Fourth, we built emerging memory with resistive-random access memory (RRAM) and thin-film transistor (TFT) devices, acting as off-chip main memory for emerging multi-tier monolithic 3D system design. We used Pearl-based simulations to quantify the system-level benefits of an emerging computing system composing newer devices, with dataflow accelerators supported by Pearl, targeting deep learning inference. Material presented in this thesis paves the way for enabling ultra-scale exploration and optimization of domain-specific accelerators for deep learning inference in a short time. Using these approaches, the quality of any new hardware accelerator (new device technology, architecture, memory, or dataflow mapping) can be quickly evaluated and optimized while considering a specific deep learning application. Doctor of Philosophy 2023-03-07T07:09:40Z 2023-03-07T07:09:40Z 2023 Thesis-Doctor of Philosophy Dutt, A. (2023). Modelling, exploration and optimization of hardware accelerators for deep learning applications. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164987 https://hdl.handle.net/10356/164987 10.32657/10356/164987 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Modelling, exploration and optimization of hardware accelerators for deep learning applications

Similar Items