Modelling, exploration and optimization of hardware accelerators for deep learning applications

Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing...

Full description

Saved in:
Bibliographic Details
Main Author: Dutt, Arko
Other Authors: Mohamed M. Sabry Aly
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/164987
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-164987
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Hardware::Integrated circuits
Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling
spellingShingle Engineering::Computer science and engineering::Hardware::Integrated circuits
Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling
Dutt, Arko
Modelling, exploration and optimization of hardware accelerators for deep learning applications
description Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing (like General Purpose Graphics Processing Unit) and new custom hardware (like Tensor Processing Unit) are proposed to meet the maximum performance needs. These computing systems are still bottle-necked by under-achieved power-efficiency due to excessive data transfers. Though a lot of new computing architectures are emerging in academia and industry targeting efficient processing of application workloads, it takes considerable amount of time to decide on the most efficient hardware solution. Additionally, computer architects cannot guarantee which hardware is the best in terms of maximum performance with least energy consumption. An efficient computing system will require co-optimizations at device-, circuit, architecture- and system-levels. A toolchain or a unified tool that can instantaneously and accurately simulate the hardware costs targeting an application workload, can therefore accelerate the design and optimization of efficient computing systems much before its actual realization. In this dissertation, we introduced and presented two mechanisms to accelerate the estimation or simulation of the cost metrics of hardware accelerators targeting deep learning workloads, such energy consumption, performance, energy-delay-product (EDP) and so on. Deep learning has achieved popularity due to its improved prediction accuracy, and it finds widespread use in large data processing applications. First, we used deep neural networks to quicken the estimations of execution time, energy consumption and area of neural network accelerators by 10^6× against baseline cycle-accurate simulations. We call this mechanism EASTDNN (Expediting Architectural SimulaTions using Deep Neural Networks), which achieves high accuracy against the baseline. A major disadvantage of EAST-DNN is the time and cost needed to collect data for training the deep neural network. Thus, an efficient technique avoiding costly training data collection is essential. Second, we formulated closed-form analytical representations to further accelerate the estimations of hardware costs of deep learning accelerators without needing DNN training overhead, while achieving accuracy comparable to or better than EAST-DNN. We call it Pearl, an acronym used to represent the approach— Towards Optimization of DNN-accelerator s using closed-form analytical representation). In addition to high accuracy and speedup compared to state-of-the-art, Pearl also provides enough flexibility to explore a lot of parameters for a particular accelerator architecture template, and it can be extended to model several other architecture templates and dataflow mapping for efficient deep learning acceleration. Pearl formulation, in general, is independent of device technology and accelerator architecture. Third, we used the faster and more-accurate simulator based on the analytical models to explore and form optimization problems, as an application of Pearl in the search of efficient (and emerging) deep learning systems. We presented several case-studies using analytical models for DNN-accelerator optimization, while imposing user-defined constraints. Area-constrained and energy-delay pareto optimization of DNN-accelerators is presented as case-studies. Cases show efficient choice of memory and compute resources through accelerated Pearl-based simulations. Huge memory accesses imposed by large DNN workloads increase the overall energy consumption. A case where emerging memory improves energy consumption of a DNN accelerator configuration is obtained through these exploration methods. Fourth, we built emerging memory with resistive-random access memory (RRAM) and thin-film transistor (TFT) devices, acting as off-chip main memory for emerging multi-tier monolithic 3D system design. We used Pearl-based simulations to quantify the system-level benefits of an emerging computing system composing newer devices, with dataflow accelerators supported by Pearl, targeting deep learning inference. Material presented in this thesis paves the way for enabling ultra-scale exploration and optimization of domain-specific accelerators for deep learning inference in a short time. Using these approaches, the quality of any new hardware accelerator (new device technology, architecture, memory, or dataflow mapping) can be quickly evaluated and optimized while considering a specific deep learning application.
author2 Mohamed M. Sabry Aly
author_facet Mohamed M. Sabry Aly
Dutt, Arko
format Thesis-Doctor of Philosophy
author Dutt, Arko
author_sort Dutt, Arko
title Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_short Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_full Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_fullStr Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_full_unstemmed Modelling, exploration and optimization of hardware accelerators for deep learning applications
title_sort modelling, exploration and optimization of hardware accelerators for deep learning applications
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/164987
_version_ 1764208162579152896
spelling sg-ntu-dr.10356-1649872023-04-04T02:58:00Z Modelling, exploration and optimization of hardware accelerators for deep learning applications Dutt, Arko Mohamed M. Sabry Aly School of Computer Science and Engineering Hardware & Embedded Systems Lab (HESL) msabry@ntu.edu.sg Engineering::Computer science and engineering::Hardware::Integrated circuits Engineering::Computer science and engineering::Computing methodologies::Simulation and modeling Current applications that require processing of large amounts of data, such as in healthcare, transportation, media, banking, telecom, internet-of-things, and security demand for new computing systems with extreme performance and energy efficiency. Several advancements in general-purpose computing (like General Purpose Graphics Processing Unit) and new custom hardware (like Tensor Processing Unit) are proposed to meet the maximum performance needs. These computing systems are still bottle-necked by under-achieved power-efficiency due to excessive data transfers. Though a lot of new computing architectures are emerging in academia and industry targeting efficient processing of application workloads, it takes considerable amount of time to decide on the most efficient hardware solution. Additionally, computer architects cannot guarantee which hardware is the best in terms of maximum performance with least energy consumption. An efficient computing system will require co-optimizations at device-, circuit, architecture- and system-levels. A toolchain or a unified tool that can instantaneously and accurately simulate the hardware costs targeting an application workload, can therefore accelerate the design and optimization of efficient computing systems much before its actual realization. In this dissertation, we introduced and presented two mechanisms to accelerate the estimation or simulation of the cost metrics of hardware accelerators targeting deep learning workloads, such energy consumption, performance, energy-delay-product (EDP) and so on. Deep learning has achieved popularity due to its improved prediction accuracy, and it finds widespread use in large data processing applications. First, we used deep neural networks to quicken the estimations of execution time, energy consumption and area of neural network accelerators by 10^6× against baseline cycle-accurate simulations. We call this mechanism EASTDNN (Expediting Architectural SimulaTions using Deep Neural Networks), which achieves high accuracy against the baseline. A major disadvantage of EAST-DNN is the time and cost needed to collect data for training the deep neural network. Thus, an efficient technique avoiding costly training data collection is essential. Second, we formulated closed-form analytical representations to further accelerate the estimations of hardware costs of deep learning accelerators without needing DNN training overhead, while achieving accuracy comparable to or better than EAST-DNN. We call it Pearl, an acronym used to represent the approach— Towards Optimization of DNN-accelerator s using closed-form analytical representation). In addition to high accuracy and speedup compared to state-of-the-art, Pearl also provides enough flexibility to explore a lot of parameters for a particular accelerator architecture template, and it can be extended to model several other architecture templates and dataflow mapping for efficient deep learning acceleration. Pearl formulation, in general, is independent of device technology and accelerator architecture. Third, we used the faster and more-accurate simulator based on the analytical models to explore and form optimization problems, as an application of Pearl in the search of efficient (and emerging) deep learning systems. We presented several case-studies using analytical models for DNN-accelerator optimization, while imposing user-defined constraints. Area-constrained and energy-delay pareto optimization of DNN-accelerators is presented as case-studies. Cases show efficient choice of memory and compute resources through accelerated Pearl-based simulations. Huge memory accesses imposed by large DNN workloads increase the overall energy consumption. A case where emerging memory improves energy consumption of a DNN accelerator configuration is obtained through these exploration methods. Fourth, we built emerging memory with resistive-random access memory (RRAM) and thin-film transistor (TFT) devices, acting as off-chip main memory for emerging multi-tier monolithic 3D system design. We used Pearl-based simulations to quantify the system-level benefits of an emerging computing system composing newer devices, with dataflow accelerators supported by Pearl, targeting deep learning inference. Material presented in this thesis paves the way for enabling ultra-scale exploration and optimization of domain-specific accelerators for deep learning inference in a short time. Using these approaches, the quality of any new hardware accelerator (new device technology, architecture, memory, or dataflow mapping) can be quickly evaluated and optimized while considering a specific deep learning application. Doctor of Philosophy 2023-03-07T07:09:40Z 2023-03-07T07:09:40Z 2023 Thesis-Doctor of Philosophy Dutt, A. (2023). Modelling, exploration and optimization of hardware accelerators for deep learning applications. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/164987 https://hdl.handle.net/10356/164987 10.32657/10356/164987 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University