Neural network compression techniques for out-of-distribution detection
One of the key challenges in deploying ML models on embedded systems are the numerous resource constraints, for instance, memory footprint, response time, and power consumption. Such real-time systems require resource-efficient models with low inference time while maintaining reasonable accuracy. In...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/159148 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | One of the key challenges in deploying ML models on embedded systems are the numerous resource constraints, for instance, memory footprint, response time, and power consumption. Such real-time systems require resource-efficient models with low inference time while maintaining reasonable accuracy. In the context of OOD detection, despite the detection model having a high classification accuracy, if the inference time is too high, the system might be rendered ineffectual.
There is significant literature on a number of neural network compression techniques. However, the majority of studies have performed offline testing on datasets like CIFAR. Few works have been implemented on some dedicated hardware or FPGAs. By implementing the above techniques on a real-time embedded system of DuckieBot, we studied the performance of these methods, particularly for the task of OOD detection. The compression techniques of pruning, quantization, and knowledge distillation have been experimented with, and analyzed on numerous metrics, for execution time, memory usage, reconstruction loss, and OOD metrics like ROC curve, True Positive, and False Positive Rates. |
---|