A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing

The scene text interpretation is a critical part of the natural scene interpretation. Currently, most of the existing work is based on high-end graphics processing units (GPUs) implementation, which is commonly used on the server side. However, in Internet of Things (IoT) application scenarios, the...

Full description

Saved in:

Bibliographic Details
Main Authors:	Li, Yixing, Liu, Zichuan, Liu, Wenye, Jiang, Yu, Wang, Yongliang, Goh, Wang Ling, Yu, Hao, Ren, Fengbo
Other Authors:	School of Electrical and Electronic Engineering
Format:	Article
Language:	English
Published:	2021
Subjects:	Engineering::Electrical and electronic engineering Application Specific Integrated Circuits Mobile Applications
Online Access:	https://hdl.handle.net/10356/150987
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-150987
record_format	dspace
spelling	sg-ntu-dr.10356-1509872021-06-02T04:11:41Z A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing Li, Yixing Liu, Zichuan Liu, Wenye Jiang, Yu Wang, Yongliang Goh, Wang Ling Yu, Hao Ren, Fengbo School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Application Specific Integrated Circuits Mobile Applications The scene text interpretation is a critical part of the natural scene interpretation. Currently, most of the existing work is based on high-end graphics processing units (GPUs) implementation, which is commonly used on the server side. However, in Internet of Things (IoT) application scenarios, the communication overhead from the edge device to the server is quite large, which sometimes even dominates the total processing time. Hence, the edge-computing oriented design is needed to solve this problem. In this paper, we present an architectural design and implementation of a natural scene text interpretation (NSTI) accelerator, which can classify and localize the text region on pixel-level efficiently in real-time on mobile devices. To target the real-time and low-latency processing, the binary convolutional encoder-decoder network is adopted as the core architecture to enable massive parallelism due to its binary feature. Massively parallelized computations and a highly pipelined data flow control enhance its latency and throughput performance. In addition, all the binarized intermediate results and parameters are stored on chip to eliminate the power consumption and latency overhead of the off-chip communication. The NSTI accelerator is implemented in a 40 nm CMOS technology, which can process scene text images (size of 128 × 32) at 34 fps and latency of 40 ms for pixelwise interpretation with the pixelwise classification accuracy over 90% on ICDAR-03 and ICDAR-13 dataset. The real energy-efficiency is 698 GOP/s/W and the peak energy-efficiency can get up to 7825 GOP/s/W. The proposed accelerator is 7 times more energy efficient than its optimized GPU-based implementation counterpart, while maintaining a real-time throughput with latency of 40 ms. Ministry of Education (MOE) Arizona State University’s work was supported by National Science Foundation under Grant IIS/CPS-1652038. Nanyang Technological Unversity’s work was supported by MOE AcRF Tier 2 under Grant MOE2015-T2-2-013. 2021-06-02T04:11:41Z 2021-06-02T04:11:41Z 2018 Journal Article Li, Y., Liu, Z., Liu, W., Jiang, Y., Wang, Y., Goh, W. L., Yu, H. & Ren, F. (2018). A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing. IEEE Transactions On Industrial Electronics, 66(9), 7407-7416. https://dx.doi.org/10.1109/TIE.2018.2875643 0278-0046 0000-0002-8190-9931 0000-0003-4590-5367 0000-0002-5922-5402 0000-0001-7466-8941 0000-0002-6509-8753 https://hdl.handle.net/10356/150987 10.1109/TIE.2018.2875643 2-s2.0-85055678252 9 66 7407 7416 en MOE2015-T2-2-013 IEEE Transactions on Industrial Electronics © 2018 IEEE. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering Application Specific Integrated Circuits Mobile Applications
spellingShingle	Engineering::Electrical and electronic engineering Application Specific Integrated Circuits Mobile Applications Li, Yixing Liu, Zichuan Liu, Wenye Jiang, Yu Wang, Yongliang Goh, Wang Ling Yu, Hao Ren, Fengbo A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing
description	The scene text interpretation is a critical part of the natural scene interpretation. Currently, most of the existing work is based on high-end graphics processing units (GPUs) implementation, which is commonly used on the server side. However, in Internet of Things (IoT) application scenarios, the communication overhead from the edge device to the server is quite large, which sometimes even dominates the total processing time. Hence, the edge-computing oriented design is needed to solve this problem. In this paper, we present an architectural design and implementation of a natural scene text interpretation (NSTI) accelerator, which can classify and localize the text region on pixel-level efficiently in real-time on mobile devices. To target the real-time and low-latency processing, the binary convolutional encoder-decoder network is adopted as the core architecture to enable massive parallelism due to its binary feature. Massively parallelized computations and a highly pipelined data flow control enhance its latency and throughput performance. In addition, all the binarized intermediate results and parameters are stored on chip to eliminate the power consumption and latency overhead of the off-chip communication. The NSTI accelerator is implemented in a 40 nm CMOS technology, which can process scene text images (size of 128 × 32) at 34 fps and latency of 40 ms for pixelwise interpretation with the pixelwise classification accuracy over 90% on ICDAR-03 and ICDAR-13 dataset. The real energy-efficiency is 698 GOP/s/W and the peak energy-efficiency can get up to 7825 GOP/s/W. The proposed accelerator is 7 times more energy efficient than its optimized GPU-based implementation counterpart, while maintaining a real-time throughput with latency of 40 ms.
author2	School of Electrical and Electronic Engineering
author_facet	School of Electrical and Electronic Engineering Li, Yixing Liu, Zichuan Liu, Wenye Jiang, Yu Wang, Yongliang Goh, Wang Ling Yu, Hao Ren, Fengbo
format	Article
author	Li, Yixing Liu, Zichuan Liu, Wenye Jiang, Yu Wang, Yongliang Goh, Wang Ling Yu, Hao Ren, Fengbo
author_sort	Li, Yixing
title	A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing
title_short	A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing
title_full	A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing
title_fullStr	A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing
title_full_unstemmed	A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing
title_sort	34-fps 698-gop/s/w binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing
publishDate	2021
url	https://hdl.handle.net/10356/150987
_version_	1702431272141324288

A 34-fps 698-GOP/s/W binarized deep neural network-based natural scene text interpretation accelerator for mobile edge computing

Similar Items