Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors
Soft vector processors in embedded FPGA platforms such as the Vector Blox MXP engine can match the performance and exceed the energy-efficiency of commercial off-the-shelf embedded SoCs with SIMD or GPU accelerators for OpenCV applications such as Saliency detection. We are also able to beat spatial...
Saved in:
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2015
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/81239 http://hdl.handle.net/10220/39151 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-81239 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-812392020-05-28T07:17:36Z Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors Hegde, Gopalakrishna Kapre, Nachiket School of Computer Engineering 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Computer Science and Engineering Soft vector processors in embedded FPGA platforms such as the Vector Blox MXP engine can match the performance and exceed the energy-efficiency of commercial off-the-shelf embedded SoCs with SIMD or GPU accelerators for OpenCV applications such as Saliency detection. We are also able to beat spatial hardware designs built from high-level synthesis while requiring significantly lower programming effort. These improvements are possible through careful scheduling of DMA operations to the vector engine, extensive use of line-buffering to enhance data reuse on the FPGA and limited use of scalar fallback for non-vectorizable code. The driving principle is to keep data and computation on the FPGA for as long as possible to exploit parallelism, data locality and lower the energy requirements of communication. Using our approach, we outperform all platforms in our architecture comparison while needing less energy. At640×480 image resolution, our implementation of MXP soft vector processor on the Xilinx Zed board exceeds the performance of the Jetson TK1-GPU by 1.5× while needing 1.6× less energy, Beagle bone Black by 4.7× at 2.3× less energy, Raspberry Piby 9× at 4× less energy, and Intel Galileo by 28× at 16× less energy. Our vector implementation also outperforms Vivado HLS generated OpenCV library implementation by 1.5×. Accepted version 2015-12-18T02:08:44Z 2019-12-06T14:26:18Z 2015-12-18T02:08:44Z 2019-12-06T14:26:18Z 2015 Conference Paper Hegde, G., & Kapre, N. (2015). Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors. 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, 76-83. https://hdl.handle.net/10356/81239 http://hdl.handle.net/10220/39151 10.1109/FCCM.2015.39 en © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: [http://dx.doi.org/10.1109/FCCM.2015.39]. 8 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Computer Science and Engineering |
spellingShingle |
Computer Science and Engineering Hegde, Gopalakrishna Kapre, Nachiket Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors |
description |
Soft vector processors in embedded FPGA platforms such as the Vector Blox MXP engine can match the performance and exceed the energy-efficiency of commercial off-the-shelf embedded SoCs with SIMD or GPU accelerators for OpenCV applications such as Saliency detection. We are also able to beat spatial hardware designs built from high-level synthesis while requiring significantly lower programming effort. These improvements are possible through careful scheduling of DMA operations to the vector engine, extensive use of line-buffering to enhance data reuse on the FPGA and limited use of scalar fallback for non-vectorizable code. The driving principle is to keep data and computation on the FPGA for as long as possible to exploit parallelism, data locality and lower the energy requirements of communication. Using our approach, we outperform all platforms in our architecture comparison while needing less energy. At640×480 image resolution, our implementation of MXP soft vector processor on the Xilinx Zed board exceeds the performance of the Jetson TK1-GPU by 1.5× while needing 1.6× less energy, Beagle bone Black by 4.7× at 2.3× less energy, Raspberry Piby 9× at 4× less energy, and Intel Galileo by 28× at 16× less energy. Our vector implementation also outperforms Vivado HLS generated OpenCV library implementation by 1.5×. |
author2 |
School of Computer Engineering |
author_facet |
School of Computer Engineering Hegde, Gopalakrishna Kapre, Nachiket |
format |
Conference or Workshop Item |
author |
Hegde, Gopalakrishna Kapre, Nachiket |
author_sort |
Hegde, Gopalakrishna |
title |
Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors |
title_short |
Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors |
title_full |
Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors |
title_fullStr |
Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors |
title_full_unstemmed |
Energy-Efficient Acceleration of OpenCV Saliency Computation Using Soft Vector Processors |
title_sort |
energy-efficient acceleration of opencv saliency computation using soft vector processors |
publishDate |
2015 |
url |
https://hdl.handle.net/10356/81239 http://hdl.handle.net/10220/39151 |
_version_ |
1681056281217990656 |