Modelling, exploration and optimization of hardware accelerators for deep learning applications
Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/164987 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-164987 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling |
spellingShingle |
Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling Dutt, Arko Modelling, exploration and optimization of hardware accelerators for deep learning applications |
description |
Current applications that require processing of large amounts of data, such
as in healthcare, transportation, media, banking, telecom, internet-of-things, and
security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing (like General Purpose Graphics Processing Unit) and new custom hardware (like Tensor Processing Unit) are proposed to meet the maximum performance needs. These computing systems are still bottle-necked by under-achieved power-efficiency due to excessive data transfers. Though a lot of new computing architectures are emerging in academia and industry targeting efficient processing of application workloads, it takes considerable amount of time to decide on the most efficient hardware solution. Additionally, computer architects cannot guarantee which hardware is the best in terms of maximum performance with least energy consumption. An efficient computing system will require co-optimizations at device-, circuit, architecture- and system-levels. A toolchain or a unified tool that can instantaneously and accurately
simulate the hardware costs targeting an application workload, can therefore accelerate the design and optimization of efficient computing systems much before its actual realization.
In this dissertation, we introduced and presented two mechanisms to accelerate the estimation or simulation of the cost metrics of hardware accelerators targeting deep learning workloads, such energy consumption, performance, energy-delay-product (EDP) and so on. Deep learning has achieved popularity due to its improved prediction accuracy, and it finds widespread use in large data processing applications. First, we used deep neural networks to quicken the estimations of execution time, energy consumption and area of neural network accelerators by 10^6× against baseline cycle-accurate simulations. We call this mechanism EASTDNN (Expediting Architectural SimulaTions using Deep Neural Networks), which achieves high accuracy against the baseline. A major disadvantage of EAST-DNN is the time and cost needed to collect data for training the deep neural network. Thus, an efficient technique avoiding costly training data collection is essential. Second, we formulated closed-form analytical representations to further accelerate the estimations of hardware costs of deep learning accelerators without needing DNN training overhead, while achieving accuracy comparable to or better than EAST-DNN. We call it Pearl, an acronym used to represent the approach— Towards Optimization of DNN-accelerator s using closed-form analytical representation). In addition to high accuracy and speedup compared to state-of-the-art, Pearl also provides enough flexibility to explore a lot of parameters for a particular accelerator architecture template, and it can be extended to model several other architecture templates and dataflow mapping for efficient deep learning acceleration. Pearl formulation, in general, is independent of device technology and accelerator architecture. Third, we used the faster and more-accurate simulator based on the analytical models to explore and form optimization problems, as an application of Pearl in the search of efficient (and emerging) deep learning systems. We presented several case-studies using analytical models for DNN-accelerator optimization, while imposing user-defined constraints. Area-constrained and energy-delay pareto optimization of DNN-accelerators is presented as case-studies. Cases show efficient choice of memory and compute resources through accelerated Pearl-based simulations. Huge memory accesses imposed by large DNN workloads increase the overall energy consumption. A case where emerging memory improves energy consumption of a DNN accelerator configuration is obtained through these exploration methods. Fourth, we built emerging memory with resistive-random access memory (RRAM) and thin-film transistor (TFT) devices, acting as off-chip main memory for emerging multi-tier monolithic 3D system design. We used Pearl-based simulations to quantify the system-level benefits of an emerging computing system composing newer devices, with dataflow accelerators supported by Pearl, targeting deep learning inference.
Material presented in this thesis paves the way for enabling ultra-scale exploration and optimization of domain-specific accelerators for deep learning inference in a short time. Using these approaches, the quality of any new hardware accelerator (new device technology, architecture, memory, or dataflow mapping) can be quickly evaluated and optimized while considering a specific deep learning application. |
author2 |
Mohamed M. Sabry Aly |
author_facet |
Mohamed M. Sabry Aly Dutt, Arko |
format |
Thesis-Doctor of Philosophy |
author |
Dutt, Arko |
author_sort |
Dutt, Arko |
title |
Modelling, exploration and optimization of hardware accelerators for deep learning applications |
title_short |
Modelling, exploration and optimization of hardware accelerators for deep learning applications |
title_full |
Modelling, exploration and optimization of hardware accelerators for deep learning applications |
title_fullStr |
Modelling, exploration and optimization of hardware accelerators for deep learning applications |
title_full_unstemmed |
Modelling, exploration and optimization of hardware accelerators for deep learning applications |
title_sort |
modelling, exploration and optimization of hardware accelerators for deep learning applications |
publisher |
Nanyang Technological University |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/164987 |
_version_ |
1764208162579152896 |
spelling |
sg-ntu-dr.10356-1649872023-04-04T02:58:00Z Modelling, exploration and optimization of hardware accelerators for deep learning applications Dutt, Arko Mohamed M. Sabry Aly School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) msabry@ntu.edu.sg Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing (like General Purpose Graphics Processing Unit) and new custom hardware (like Tensor Processing Unit) are proposed to meet the maximum performance needs. These computing systems are still bottle-necked by under-achieved power-efficiency due to excessive data transfers. Though a lot of new computing architectures are emerging in academia and industry targeting efficient processing of application workloads, it takes considerable amount of time to decide on the most efficient hardware solution. Additionally, computer architects cannot guarantee which hardware is the best in terms of maximum performance with least energy consumption. An efficient computing system will require co-optimizations at device-, circuit, architecture- and system-levels. A toolchain or a unified tool that can instantaneously and accurately simulate the hardware costs targeting an application workload, can therefore accelerate the design and optimization of efficient computing systems much before its actual realization. In this dissertation, we introduced and presented two mechanisms to accelerate the estimation or simulation of the cost metrics of hardware accelerators targeting deep learning workloads, such energy consumption, performance, energy-delay-product (EDP) and so on. Deep learning has achieved popularity due to its improved prediction accuracy, and it finds widespread use in large data processing applications. First, we used deep neural networks to quicken the estimations of execution time, energy consumption and area of neural network accelerators by 10^6× against baseline cycle-accurate simulations. We call this mechanism EASTDNN (Expediting Architectural SimulaTions using Deep Neural Networks), which achieves high accuracy against the baseline. A major disadvantage of EAST-DNN is the time and cost needed to collect data for training the deep neural network. Thus, an efficient technique avoiding costly training data collection is essential. Second, we formulated closed-form analytical representations to further accelerate the estimations of hardware costs of deep learning accelerators without needing DNN training overhead, while achieving accuracy comparable to or better than EAST-DNN. We call it Pearl, an acronym used to represent the approach— Towards Optimization of DNN-accelerator s using closed-form analytical representation). In addition to high accuracy and speedup compared to state-of-the-art, Pearl also provides enough flexibility to explore a lot of parameters for a particular accelerator architecture template, and it can be extended to model several other architecture templates and dataflow mapping for efficient deep learning acceleration. Pearl formulation, in general, is independent of device technology and accelerator architecture. Third, we used the faster and more-accurate simulator based on the analytical models to explore and form optimization problems, as an application of Pearl in the search of efficient (and emerging) deep learning systems. We presented several case-studies using analytical models for DNN-accelerator optimization, while imposing user-defined constraints. Area-constrained and energy-delay pareto optimization of DNN-accelerators is presented as case-studies. Cases show efficient choice of memory and compute resources through accelerated Pearl-based simulations. Huge memory accesses imposed by large DNN workloads increase the overall energy consumption. A case where emerging memory improves energy consumption of a DNN accelerator configuration is obtained through these exploration methods. Fourth, we built emerging memory with resistive-random access memory (RRAM) and thin-film transistor (TFT) devices, acting as off-chip main memory for emerging multi-tier monolithic 3D system design. We used Pearl-based simulations to quantify the system-level benefits of an emerging computing system composing newer devices, with dataflow accelerators supported by Pearl, targeting deep learning inference. Material presented in this thesis paves the way for enabling ultra-scale exploration and optimization of domain-specific accelerators for deep learning inference in a short time. Using these approaches, the quality of any new hardware accelerator (new device technology, architecture, memory, or dataflow mapping) can be quickly evaluated and optimized while considering a specific deep learning application. Doctor of Philosophy 2023-03-07T07:09:40Z 2023-03-07T07:09:40Z 2023 Thesis-Doctor of Philosophy Dutt, A. (2023). Modelling, exploration and optimization of hardware accelerators for deep learning applications. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164987 https://hdl.handle.net/10356/164987 10.32657/10356/164987 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University |