Enabling efficient edge intelligence: a hardware-software codesign approach
Deep Neural Networks (DNNs) have made significant advancements in various domains, including computer vision (CV), natural language processing (NLP), etc. With the Internet of Things (IoT) and edge computing rapidly developing, edge intelligence, which brings computing services closer to the user, h...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Doctor of Philosophy |
Language: | English |
Published: |
Nanyang Technological University
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172499 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Deep Neural Networks (DNNs) have made significant advancements in various domains, including computer vision (CV), natural language processing (NLP), etc. With the Internet of Things (IoT) and edge computing rapidly developing, edge intelligence, which brings computing services closer to the user, has become a key technique for mitigating latency issues and addressing privacy concerns of accessing cloud servers. To enhance the efficiency (i.e., high accuracy and high throughput) of edge intelligence, in this thesis, we optimize it from three fundamental and indispensable aspects: algorithms, training data, and computing hardware.
In terms of algorithms, it is crucial to ensure suitable-sized and optimized DNN architectures. Model pruning and quantization can reduce the computational and memory demands of models. Besides, model scaling can improve accuracy. In terms of training data, a large and diverse training dataset can enhance the generalization ability of the model, achieving higher accuracy. Collaborative training across participants can increase the data volume and diversity for each one. In terms of computing hardware, efficient edge accelerators can enhance inference throughput. In-Memory Processing (IMP) can leverage specialized hardware units designed for DNN algorithms, minimizing latency and optimizing power consumption.
Nevertheless, current approaches related to these aspects face several challenges. First, many applications have strict latency constraints, making it necessary to optimize model inference time in latency-critical edge systems. Traditional DNN algorithm optimizations do not directly adjust inference time. Second, to fit various devices with different abilities, heterogeneous models are required for different devices during collaborative training, especially under latency constraints. This presents obstacles to achieving high accuracy. Third, IMP accelerators face challenges such as limited resources and non-ideal behaviors. Current solutions introduce significant hardware overhead, causing excessive power usage and chip area.
To address these limitations in current methods, we first propose a hardware-aware DNN learning method to optimize models for high accuracy while meeting the system latency constraint by a single training process. We design a compact learning scheme, which compresses redundant models to meet a hard latency requirement on targeted devices by dynamically zeroizing and recovering Batch Normalization (BN) layers. Based on it, we also design a scaling scheme for simple models to improve their accuracy under the latency constraint. For efficiency, we further design a hardware-customized latency predictor to guide this learning process.
After optimizing the model algorithm, we propose a hardware-aware collaborative training framework based on Federated Learning (FL), which can expand the training dataset for higher accuracy. Besides, it can learn heterogeneous models to meet the latency constraints of multiple edge systems simultaneously. We use our high-accuracy dynamic zeroizing-recovering method to adjust each local model under its latency constraint. A proto-corrected aggregation scheme is further designed to aggregate all heterogeneous local models, satisfying the latency constraint of different systems with one training process and maintaining high accuracy.
However, in scenarios that demand extremely low power consumption and high throughput, emerging accelerators are needed to further optimize edge intelligence. IMP architecture is promising for DNN inference. To meet resource constraints and minimize power consumption in IMP devices, we use filter-group pruning and crossbar pruning to reduce crossbar usage without extra hardware units for data aligning. Besides, we adopt the non-ideality adaptation and self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars without large hardware overhead. Finally, we integrate them into one training process for co-optimization, which improves the accuracy of the final model.
In summary, we achieve efficient edge intelligence by optimizing DNN algorithms, training data, and computing devices, encompassing both software and hardware aspects. This unlocks the potential of edge intelligence, ensuring data privacy, achieving high accuracy, and keeping significant throughput across various applications. In the future, we will continue focusing on hardware-software co-design for edge intelligence. First, we intend to develop a dynamic reconfiguration architecture, which can seamlessly switch IMP cells between memory and computing functions, to optimally allocate memory and computing resources for enhancing DNN inference efficiency. Second, we will design IMP accelerators to support various algorithms like Transformer. It will co-optimize algorithms, data, and IMP devices, aiming to comprehensively advance the capabilities and applications of edge intelligence. Third, we will propose a hybrid CNN-Transformers Neural Architecture Search (NAS) framework for the IMP architecture to achieve hardware friendliness, high accuracy, high robustness, low latency, and low power consumption IMP-based edge intelligence. |
---|