Ultra-low power real-ime object detection based on quantized CNNs

With the recent proliferation of deep learning-based solutions to object detection, the state-of-the-art accuracy has been increasing far beyond what was achievable using traditional methods. However, the hardware requirements for running these models in real-time are high, so they are expensive to...

Full description

Saved in:
Bibliographic Details
Main Author: Chew, Jing Wei
Other Authors: Weichen Liu
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/148048
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the recent proliferation of deep learning-based solutions to object detection, the state-of-the-art accuracy has been increasing far beyond what was achievable using traditional methods. However, the hardware requirements for running these models in real-time are high, so they are expensive to deploy on the edge. Furthermore, due to their large model size, their memory footprint is unnecessarily high, and this also leads to excessive power consumption which makes them unfeasible for deployment on resource-constrained environments with no constant power source. Therefore, this project proposes the use of the most extreme network quantization possible, i.e. binarization, to make a YOLO-based object detection model deployable on the edge, while attaining reasonable accuracy. Using this approach, the proposed model can run at 37.7 FPS on an NVIDIA Jetson Nano with a peak memory footprint of 17.1 MB, while attaining a reasonable mAP@0.50 Intersection over Union (IoU) of 0.37 on the Pascal Visual Object Classes (VOC) dataset. Furthermore, these figures signify a speedup of 21.8x and a memory usage reduction by a factor of 15.3x compared to a similar YOLOv2 full-precision model architecture. Since computation was completely performed on the CPU, the use of TensorRT delegates or any other embedded hardware accelerator can allow for larger models with higher accuracies to be deployed in future works. The full project is open-sourced and can be found in https://github.com/tehtea/QuickYOLO.