Hardware-aware neural architecture search and compression towards embedded intelligence

With the increasing availability of large-scale datasets and powerful computing paradigms, convolutional neural networks (CNNs) have empowered a wide range of intelligent embedded vision tasks, which span from image classification to downstream vision tasks, such as on-device object recognition, det...

Full description

Saved in:

Bibliographic Details
Main Author:	Luo, Xiangzhong
Other Authors:	Weichen Liu
Format:	Thesis-Doctor of Philosophy
Language:	English
Published:	Nanyang Technological University 2023
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/172506
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-172506
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Luo, Xiangzhong Hardware-aware neural architecture search and compression towards embedded intelligence
description	With the increasing availability of large-scale datasets and powerful computing paradigms, convolutional neural networks (CNNs) have empowered a wide range of intelligent embedded vision tasks, which span from image classification to downstream vision tasks, such as on-device object recognition, detection, and tracking. In the past few years, convolutional networks have been evolving deeper and wider in order to maintain superior accuracy on target task. This rule of thumb, despite its efficacy, leads to an exponential growth in the number of floating-point operations (FLOPs) and parameters. For example, ResNet50, as one of the most representative convolutional networks, consists of over 4 billion FLOPs and 25 million parameters. The prohibitive network complexity, as a result, further enlarges the computational gap between computation-intensive CNNs and resource-constrained embedded platforms, making it challenging to develop hardware-friendly network solutions to accommodate the limited available computational resources in real-world embedded scenarios towards embedded intelligence. This thesis focuses on alleviating the above computational gap from the perspective of hardware-aware neural architecture search (NAS) and compression. First of all, we introduce SurgeNAS for efficient architecture search. Specifically, SurgeNAS turns back to one-level optimization for accurate and consistent gradient estimation, which also features an effective identity mapping scheme in order to avoid the search collapse. In addition, we introduce an efficient ordered differentiable sampling approach to reduce the memory consumption to the single-path level, while at the same time maintaining strict search fairness. An efficient graph neural networks (GNNs) based latency predictor is further proposed and integrated into the search engine to avoid tedious on-device latency measurements during the search process. Finally, we introduce the paradigm of Comfort Zone, which allows us to scale up the searched architecture candidates to achieve better accuracy on target task without degrading the inference efficiency on target hardware. Furthermore, we introduce LightNAS for flexible architecture search. The motivation behind LightNAS is that previous relevant NAS methods, including SurgeNAS, simply focus on reducing the explicit search cost -- the time for one single search, while ignoring the huge implicit search cost -- the time for manual hyper-parameter tuning to derive the required architecture candidate. In practice, previous relevant NAS methods have to perform manual hyper-parameter tuning in order to navigate the required architecture candidate that satisfies the specified latency constraint, which empirically involves 10 trial-and-errors and thus significantly increases the total search cost by 10 times. In contrast, LightNAS only requires one single search for any specified latency constraint (i.e., you only search once). In addition, we introduce an efficient yet reliable proxy, namely batchwise training estimation (BTE), which can be seamlessly integrated into LightNAS to enable channel-level explorations at low computational cost. This further boosts the attainable accuracy on target task without degrading the efficiency on target hardware. Finally, we introduce Domino for efficient network compression, in which we pioneer to revisit the trade-off dilemma between accuracy and efficiency from a fresh perspective of linearity and non-linearity. Specifically, Domino focuses on trading the less important network non-linearity for better network efficiency. To this end, Domino leverages two efficient performance predictors, including one vanilla latency predictor and one meta-accuracy predictor, to explore less important non-linear building blocks, which are then grafted with their linear counterparts. The resulting grafted network is further trained on target task to achieve decent accuracy. Finally, we reparameterize each grafted linear building block that consists of multiple consecutive linear layers, including multiple convolutional, batch normalization (BN), and grafted linear activation layers, into one single convolutional layer to aggressively boost the efficiency on target hardware, and more importantly, without sacrificing the accuracy on target task since the network maintains the same output regardless of linear reparameterization. In summary, this thesis focuses on hardware-aware neural architecture search and compression to deliver efficient network solutions for resource-constrained embedded platforms to empower embedded intelligence. Future research will continue to explore more general search spaces and more advanced search/compression techniques to develop more efficient networks for intelligent embedded applications.
author2	Weichen Liu
author_facet	Weichen Liu Luo, Xiangzhong
format	Thesis-Doctor of Philosophy
author	Luo, Xiangzhong
author_sort	Luo, Xiangzhong
title	Hardware-aware neural architecture search and compression towards embedded intelligence
title_short	Hardware-aware neural architecture search and compression towards embedded intelligence
title_full	Hardware-aware neural architecture search and compression towards embedded intelligence
title_fullStr	Hardware-aware neural architecture search and compression towards embedded intelligence
title_full_unstemmed	Hardware-aware neural architecture search and compression towards embedded intelligence
title_sort	hardware-aware neural architecture search and compression towards embedded intelligence
publisher	Nanyang Technological University
publishDate	2023
url	https://hdl.handle.net/10356/172506
_version_	1787590721698529280
spelling	sg-ntu-dr.10356-1725062024-01-04T06:32:51Z Hardware-aware neural architecture search and compression towards embedded intelligence Luo, Xiangzhong Weichen Liu School of Computer Science and Engineering Parallel and Distributed Computing Centre liu@ntu.edu.sg Engineering::Computer science and engineering With the increasing availability of large-scale datasets and powerful computing paradigms, convolutional neural networks (CNNs) have empowered a wide range of intelligent embedded vision tasks, which span from image classification to downstream vision tasks, such as on-device object recognition, detection, and tracking. In the past few years, convolutional networks have been evolving deeper and wider in order to maintain superior accuracy on target task. This rule of thumb, despite its efficacy, leads to an exponential growth in the number of floating-point operations (FLOPs) and parameters. For example, ResNet50, as one of the most representative convolutional networks, consists of over 4 billion FLOPs and 25 million parameters. The prohibitive network complexity, as a result, further enlarges the computational gap between computation-intensive CNNs and resource-constrained embedded platforms, making it challenging to develop hardware-friendly network solutions to accommodate the limited available computational resources in real-world embedded scenarios towards embedded intelligence. This thesis focuses on alleviating the above computational gap from the perspective of hardware-aware neural architecture search (NAS) and compression. First of all, we introduce SurgeNAS for efficient architecture search. Specifically, SurgeNAS turns back to one-level optimization for accurate and consistent gradient estimation, which also features an effective identity mapping scheme in order to avoid the search collapse. In addition, we introduce an efficient ordered differentiable sampling approach to reduce the memory consumption to the single-path level, while at the same time maintaining strict search fairness. An efficient graph neural networks (GNNs) based latency predictor is further proposed and integrated into the search engine to avoid tedious on-device latency measurements during the search process. Finally, we introduce the paradigm of Comfort Zone, which allows us to scale up the searched architecture candidates to achieve better accuracy on target task without degrading the inference efficiency on target hardware. Furthermore, we introduce LightNAS for flexible architecture search. The motivation behind LightNAS is that previous relevant NAS methods, including SurgeNAS, simply focus on reducing the explicit search cost -- the time for one single search, while ignoring the huge implicit search cost -- the time for manual hyper-parameter tuning to derive the required architecture candidate. In practice, previous relevant NAS methods have to perform manual hyper-parameter tuning in order to navigate the required architecture candidate that satisfies the specified latency constraint, which empirically involves 10 trial-and-errors and thus significantly increases the total search cost by 10 times. In contrast, LightNAS only requires one single search for any specified latency constraint (i.e., you only search once). In addition, we introduce an efficient yet reliable proxy, namely batchwise training estimation (BTE), which can be seamlessly integrated into LightNAS to enable channel-level explorations at low computational cost. This further boosts the attainable accuracy on target task without degrading the efficiency on target hardware. Finally, we introduce Domino for efficient network compression, in which we pioneer to revisit the trade-off dilemma between accuracy and efficiency from a fresh perspective of linearity and non-linearity. Specifically, Domino focuses on trading the less important network non-linearity for better network efficiency. To this end, Domino leverages two efficient performance predictors, including one vanilla latency predictor and one meta-accuracy predictor, to explore less important non-linear building blocks, which are then grafted with their linear counterparts. The resulting grafted network is further trained on target task to achieve decent accuracy. Finally, we reparameterize each grafted linear building block that consists of multiple consecutive linear layers, including multiple convolutional, batch normalization (BN), and grafted linear activation layers, into one single convolutional layer to aggressively boost the efficiency on target hardware, and more importantly, without sacrificing the accuracy on target task since the network maintains the same output regardless of linear reparameterization. In summary, this thesis focuses on hardware-aware neural architecture search and compression to deliver efficient network solutions for resource-constrained embedded platforms to empower embedded intelligence. Future research will continue to explore more general search spaces and more advanced search/compression techniques to develop more efficient networks for intelligent embedded applications. Doctor of Philosophy 2023-12-13T07:30:27Z 2023-12-13T07:30:27Z 2023 Thesis-Doctor of Philosophy Luo, X. (2023). Hardware-aware neural architecture search and compression towards embedded intelligence. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/172506 https://hdl.handle.net/10356/172506 10.32657/10356/172506 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University

Hardware-aware neural architecture search and compression towards embedded intelligence

Similar Items