Enabling efficient edge intelligence: a hardware-software codesign approach

Deep Neural Networks (DNNs) have made significant advancements in various domains, including computer vision (CV), natural language processing (NLP), etc. With the Internet of Things (IoT) and edge computing rapidly developing, edge intelligence, which brings computing services closer to the user, h...

Full description

Saved in:
Bibliographic Details
Main Author: Huai, Shuo
Other Authors: Weichen Liu
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172499
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172499
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Hardware::Performance and reliability
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Computer science and engineering::Hardware::Performance and reliability
Huai, Shuo
Enabling efficient edge intelligence: a hardware-software codesign approach
description Deep Neural Networks (DNNs) have made significant advancements in various domains, including computer vision (CV), natural language processing (NLP), etc. With the Internet of Things (IoT) and edge computing rapidly developing, edge intelligence, which brings computing services closer to the user, has become a key technique for mitigating latency issues and addressing privacy concerns of accessing cloud servers. To enhance the efficiency (i.e., high accuracy and high throughput) of edge intelligence, in this thesis, we optimize it from three fundamental and indispensable aspects: algorithms, training data, and computing hardware. In terms of algorithms, it is crucial to ensure suitable-sized and optimized DNN architectures. Model pruning and quantization can reduce the computational and memory demands of models. Besides, model scaling can improve accuracy. In terms of training data, a large and diverse training dataset can enhance the generalization ability of the model, achieving higher accuracy. Collaborative training across participants can increase the data volume and diversity for each one. In terms of computing hardware, efficient edge accelerators can enhance inference throughput. In-Memory Processing (IMP) can leverage specialized hardware units designed for DNN algorithms, minimizing latency and optimizing power consumption. Nevertheless, current approaches related to these aspects face several challenges. First, many applications have strict latency constraints, making it necessary to optimize model inference time in latency-critical edge systems. Traditional DNN algorithm optimizations do not directly adjust inference time. Second, to fit various devices with different abilities, heterogeneous models are required for different devices during collaborative training, especially under latency constraints. This presents obstacles to achieving high accuracy. Third, IMP accelerators face challenges such as limited resources and non-ideal behaviors. Current solutions introduce significant hardware overhead, causing excessive power usage and chip area. To address these limitations in current methods, we first propose a hardware-aware DNN learning method to optimize models for high accuracy while meeting the system latency constraint by a single training process. We design a compact learning scheme, which compresses redundant models to meet a hard latency requirement on targeted devices by dynamically zeroizing and recovering Batch Normalization (BN) layers. Based on it, we also design a scaling scheme for simple models to improve their accuracy under the latency constraint. For efficiency, we further design a hardware-customized latency predictor to guide this learning process. After optimizing the model algorithm, we propose a hardware-aware collaborative training framework based on Federated Learning (FL), which can expand the training dataset for higher accuracy. Besides, it can learn heterogeneous models to meet the latency constraints of multiple edge systems simultaneously. We use our high-accuracy dynamic zeroizing-recovering method to adjust each local model under its latency constraint. A proto-corrected aggregation scheme is further designed to aggregate all heterogeneous local models, satisfying the latency constraint of different systems with one training process and maintaining high accuracy. However, in scenarios that demand extremely low power consumption and high throughput, emerging accelerators are needed to further optimize edge intelligence. IMP architecture is promising for DNN inference. To meet resource constraints and minimize power consumption in IMP devices, we use filter-group pruning and crossbar pruning to reduce crossbar usage without extra hardware units for data aligning. Besides, we adopt the non-ideality adaptation and self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars without large hardware overhead. Finally, we integrate them into one training process for co-optimization, which improves the accuracy of the final model. In summary, we achieve efficient edge intelligence by optimizing DNN algorithms, training data, and computing devices, encompassing both software and hardware aspects. This unlocks the potential of edge intelligence, ensuring data privacy, achieving high accuracy, and keeping significant throughput across various applications. In the future, we will continue focusing on hardware-software co-design for edge intelligence. First, we intend to develop a dynamic reconfiguration architecture, which can seamlessly switch IMP cells between memory and computing functions, to optimally allocate memory and computing resources for enhancing DNN inference efficiency. Second, we will design IMP accelerators to support various algorithms like Transformer. It will co-optimize algorithms, data, and IMP devices, aiming to comprehensively advance the capabilities and applications of edge intelligence. Third, we will propose a hybrid CNN-Transformers Neural Architecture Search (NAS) framework for the IMP architecture to achieve hardware friendliness, high accuracy, high robustness, low latency, and low power consumption IMP-based edge intelligence.
author2 Weichen Liu
author_facet Weichen Liu
Huai, Shuo
format Thesis-Doctor of Philosophy
author Huai, Shuo
author_sort Huai, Shuo
title Enabling efficient edge intelligence: a hardware-software codesign approach
title_short Enabling efficient edge intelligence: a hardware-software codesign approach
title_full Enabling efficient edge intelligence: a hardware-software codesign approach
title_fullStr Enabling efficient edge intelligence: a hardware-software codesign approach
title_full_unstemmed Enabling efficient edge intelligence: a hardware-software codesign approach
title_sort enabling efficient edge intelligence: a hardware-software codesign approach
publisher Nanyang Technological University
publishDate 2023
url https://hdl.handle.net/10356/172499
_version_ 1787590739885031424
spelling sg-ntu-dr.10356-1724992024-01-04T06:32:51Z Enabling efficient edge intelligence: a hardware-software codesign approach Huai, Shuo Weichen Liu School of Computer Science and Engineering liu@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Computer science and engineering::Hardware::Performance and reliability Deep Neural Networks (DNNs) have made significant advancements in various domains, including computer vision (CV), natural language processing (NLP), etc. With the Internet of Things (IoT) and edge computing rapidly developing, edge intelligence, which brings computing services closer to the user, has become a key technique for mitigating latency issues and addressing privacy concerns of accessing cloud servers. To enhance the efficiency (i.e., high accuracy and high throughput) of edge intelligence, in this thesis, we optimize it from three fundamental and indispensable aspects: algorithms, training data, and computing hardware. In terms of algorithms, it is crucial to ensure suitable-sized and optimized DNN architectures. Model pruning and quantization can reduce the computational and memory demands of models. Besides, model scaling can improve accuracy. In terms of training data, a large and diverse training dataset can enhance the generalization ability of the model, achieving higher accuracy. Collaborative training across participants can increase the data volume and diversity for each one. In terms of computing hardware, efficient edge accelerators can enhance inference throughput. In-Memory Processing (IMP) can leverage specialized hardware units designed for DNN algorithms, minimizing latency and optimizing power consumption. Nevertheless, current approaches related to these aspects face several challenges. First, many applications have strict latency constraints, making it necessary to optimize model inference time in latency-critical edge systems. Traditional DNN algorithm optimizations do not directly adjust inference time. Second, to fit various devices with different abilities, heterogeneous models are required for different devices during collaborative training, especially under latency constraints. This presents obstacles to achieving high accuracy. Third, IMP accelerators face challenges such as limited resources and non-ideal behaviors. Current solutions introduce significant hardware overhead, causing excessive power usage and chip area. To address these limitations in current methods, we first propose a hardware-aware DNN learning method to optimize models for high accuracy while meeting the system latency constraint by a single training process. We design a compact learning scheme, which compresses redundant models to meet a hard latency requirement on targeted devices by dynamically zeroizing and recovering Batch Normalization (BN) layers. Based on it, we also design a scaling scheme for simple models to improve their accuracy under the latency constraint. For efficiency, we further design a hardware-customized latency predictor to guide this learning process. After optimizing the model algorithm, we propose a hardware-aware collaborative training framework based on Federated Learning (FL), which can expand the training dataset for higher accuracy. Besides, it can learn heterogeneous models to meet the latency constraints of multiple edge systems simultaneously. We use our high-accuracy dynamic zeroizing-recovering method to adjust each local model under its latency constraint. A proto-corrected aggregation scheme is further designed to aggregate all heterogeneous local models, satisfying the latency constraint of different systems with one training process and maintaining high accuracy. However, in scenarios that demand extremely low power consumption and high throughput, emerging accelerators are needed to further optimize edge intelligence. IMP architecture is promising for DNN inference. To meet resource constraints and minimize power consumption in IMP devices, we use filter-group pruning and crossbar pruning to reduce crossbar usage without extra hardware units for data aligning. Besides, we adopt the non-ideality adaptation and self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars without large hardware overhead. Finally, we integrate them into one training process for co-optimization, which improves the accuracy of the final model. In summary, we achieve efficient edge intelligence by optimizing DNN algorithms, training data, and computing devices, encompassing both software and hardware aspects. This unlocks the potential of edge intelligence, ensuring data privacy, achieving high accuracy, and keeping significant throughput across various applications. In the future, we will continue focusing on hardware-software co-design for edge intelligence. First, we intend to develop a dynamic reconfiguration architecture, which can seamlessly switch IMP cells between memory and computing functions, to optimally allocate memory and computing resources for enhancing DNN inference efficiency. Second, we will design IMP accelerators to support various algorithms like Transformer. It will co-optimize algorithms, data, and IMP devices, aiming to comprehensively advance the capabilities and applications of edge intelligence. Third, we will propose a hybrid CNN-Transformers Neural Architecture Search (NAS) framework for the IMP architecture to achieve hardware friendliness, high accuracy, high robustness, low latency, and low power consumption IMP-based edge intelligence. Doctor of Philosophy 2023-12-13T07:25:18Z 2023-12-13T07:25:18Z 2023 Thesis-Doctor of Philosophy Huai, S. (2023). Enabling efficient edge intelligence: a hardware-software codesign approach. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172499 https://hdl.handle.net/10356/172499 10.32657/10356/172499 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University