Neural network compression techniques for out-of-distribution detection

One of the key challenges in deploying ML models on embedded systems are the numerous resource constraints, for instance, memory footprint, response time, and power consumption. Such real-time systems require resource-efficient models with low inference time while maintaining reasonable accuracy. In...

Full description

Saved in:
Bibliographic Details
Main Author: Bansal, Aditya
Other Authors: Arvind Easwaran
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/159148
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:One of the key challenges in deploying ML models on embedded systems are the numerous resource constraints, for instance, memory footprint, response time, and power consumption. Such real-time systems require resource-efficient models with low inference time while maintaining reasonable accuracy. In the context of OOD detection, despite the detection model having a high classification accuracy, if the inference time is too high, the system might be rendered ineffectual. There is significant literature on a number of neural network compression techniques. However, the majority of studies have performed offline testing on datasets like CIFAR. Few works have been implemented on some dedicated hardware or FPGAs. By implementing the above techniques on a real-time embedded system of DuckieBot, we studied the performance of these methods, particularly for the task of OOD detection. The compression techniques of pruning, quantization, and knowledge distillation have been experimented with, and analyzed on numerous metrics, for execution time, memory usage, reconstruction loss, and OOD metrics like ROC curve, True Positive, and False Positive Rates.