Deep active learning for training object detection

While there have been extensive applications deploying object detection, one of its limitations is the continuous need for a large amount of annotated images for reliable performance. This can be attributed to the limitation of the conventional workflow of training supervised object detection algori...

Full description

Saved in:
Bibliographic Details
Main Author: Jose, John Anthony
Format: text
Language:English
Published: Animo Repository 2022
Subjects:
Online Access:https://animorepository.dlsu.edu.ph/etdd_ece/2
https://animorepository.dlsu.edu.ph/cgi/viewcontent.cgi?article=1001&context=etdd_ece
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: De La Salle University
Language: English
Description
Summary:While there have been extensive applications deploying object detection, one of its limitations is the continuous need for a large amount of annotated images for reliable performance. This can be attributed to the limitation of the conventional workflow of training supervised object detection algorithms. The aim of this study is to propose a new workflow that reduces the amount of annotated images needed for training by "intelligently" sampling the most informative unlabeled image, known as \textit{active learning}. Existing active learning literature has focused on incorporating prediction uncertainty to identify the most informative image. While it is significant and has merit, focusing on improving uncertainty estimation is not holistic. This study proposes that there are two more factors that are equally important to be considered: (1) improving the representation in a limited label setting, (2) suppressing noisy prediction when intelligently sampling for new images. Using these simple modifications, it is able to acquire 76% mean average precision (mAP) using 20% of the data, which beats state-of-the-art by a large margin. By comparing its performance with conventional training workflow, it is able to garner 95% of the performance using only 20% of the images.