XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices

Accelerating the inference of Convolution Neural Networks (CNNs) on edge devices is essential due to the small memory size and poor computation capability of these devices. Network quantization methods such as XNOR-Net, Bi-Real-Net, and XNOR-Net++ reduce the memory usage of CNNs by binarizing the CN...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhu, Shien, Duong, Luan H. K., Liu, Weichen
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2020
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Binary Neural Networks Edge Devices
Online Access:	https://hdl.handle.net/10356/145503 https://doi.org/10.21979/N9/XEH3D1
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-145503
record_format	dspace
spelling	sg-ntu-dr.10356-1455032020-12-23T05:47:32Z XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices Zhu, Shien Duong, Luan H. K. Liu, Weichen School of Computer Science and Engineering 2020 IEEE 26th International Conference on Parallel and Distributed Systems (ICPADS) Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Binary Neural Networks Edge Devices Accelerating the inference of Convolution Neural Networks (CNNs) on edge devices is essential due to the small memory size and poor computation capability of these devices. Network quantization methods such as XNOR-Net, Bi-Real-Net, and XNOR-Net++ reduce the memory usage of CNNs by binarizing the CNNs. They also simplify the multiplication operations to bit-wise operations and obtain good speedup on edge devices. However, there are hidden redundancies in the computation pipeline of these methods, constraining the speedup of those binarized CNNs. In this paper, we propose XOR-Net as an optimized computation pipeline for binary networks both without and with scaling factors. As XNOR is realized by two instructions XOR and NOT on CPU/GPU platforms, XOR-Net avoids NOT operations by using XOR instead of XNOR, thus reduces bit-wise operations in both aforementioned kinds of binary convolution layers. For the binary convolution with scaling factors, our XOR-Net further rearranges the computation sequence of calculating and multiplying the scaling factors to reduce full-precision operations. Theoretical analysis shows that XOR-Net reduces one-third of the bit-wise operations compared with traditional binary convolution, and up to 40\% of the full-precision operations compared with XNOR-Net. Experimental results show that our XOR-Net binary convolution without scaling factors achieves up to 135X speedup and consumes no more than 0.8% energy compared with parallel full-precision convolution. For the binary convolution with scaling factors, XOR-Net is up to 17% faster and 19% more energy-efficient than XNOR-Net. Ministry of Education (MOE) Nanyang Technological University Accepted version This work is partially supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (MOE2019-T2-1-071) and Tier 1 (MOE2019-T1- 001-072), and partially supported by Nanyang Technological University, Singapore, under its NAP (M4082282) and SUG (M4082087). 2020-12-23T05:47:32Z 2020-12-23T05:47:32Z 2020 Conference Paper Zhu, S., Duong, L. H. K., & Liu, W. (2020). XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices. Proceedings of the International Conference on Parallel and Distributed Systems (ICPADS). https://hdl.handle.net/10356/145503 en Academic Research Fund Tier 1 (MOE2019-T1- 001-072), Ministry of Education, Singapore Academic Research Fund Tier 2 (MOE2019-T2-1-071), Ministry of Education, Singapore NAP (M4082282), Nanyang Technological University, Singapore SUG (M4082087), Nanyang Technological University, Singapore https://doi.org/10.21979/N9/XEH3D1 © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Binary Neural Networks Edge Devices
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Binary Neural Networks Edge Devices Zhu, Shien Duong, Luan H. K. Liu, Weichen XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices
description	Accelerating the inference of Convolution Neural Networks (CNNs) on edge devices is essential due to the small memory size and poor computation capability of these devices. Network quantization methods such as XNOR-Net, Bi-Real-Net, and XNOR-Net++ reduce the memory usage of CNNs by binarizing the CNNs. They also simplify the multiplication operations to bit-wise operations and obtain good speedup on edge devices. However, there are hidden redundancies in the computation pipeline of these methods, constraining the speedup of those binarized CNNs. In this paper, we propose XOR-Net as an optimized computation pipeline for binary networks both without and with scaling factors. As XNOR is realized by two instructions XOR and NOT on CPU/GPU platforms, XOR-Net avoids NOT operations by using XOR instead of XNOR, thus reduces bit-wise operations in both aforementioned kinds of binary convolution layers. For the binary convolution with scaling factors, our XOR-Net further rearranges the computation sequence of calculating and multiplying the scaling factors to reduce full-precision operations. Theoretical analysis shows that XOR-Net reduces one-third of the bit-wise operations compared with traditional binary convolution, and up to 40\% of the full-precision operations compared with XNOR-Net. Experimental results show that our XOR-Net binary convolution without scaling factors achieves up to 135X speedup and consumes no more than 0.8% energy compared with parallel full-precision convolution. For the binary convolution with scaling factors, XOR-Net is up to 17% faster and 19% more energy-efficient than XNOR-Net.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zhu, Shien Duong, Luan H. K. Liu, Weichen
format	Conference or Workshop Item
author	Zhu, Shien Duong, Luan H. K. Liu, Weichen
author_sort	Zhu, Shien
title	XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices
title_short	XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices
title_full	XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices
title_fullStr	XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices
title_full_unstemmed	XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices
title_sort	xor-net : an efficient computation pipeline for binary neural network inference on edge devices
publishDate	2020
url	https://hdl.handle.net/10356/145503 https://doi.org/10.21979/N9/XEH3D1
_version_	1688665379411329024

XOR-Net : an efficient computation pipeline for binary neural network inference on edge devices

Similar Items