FASTER R-CNN AND SSDLITE MODEL IMPLEMENTATION FOR OBJECT DETECTION APPLICATION FOR LOW VISION USERS

Low vision is a condition where a person's visual function decreases permanently, it makes the sufferer difficult to carry out daily activities. One of the problems felt by people with low vision is the difficulty in finding objects. Mobile applications with object detection features are bui...

Full description

Saved in:
Bibliographic Details
Main Author: Azharyanti, Chandrika
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/69186
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Low vision is a condition where a person's visual function decreases permanently, it makes the sufferer difficult to carry out daily activities. One of the problems felt by people with low vision is the difficulty in finding objects. Mobile applications with object detection features are built to help people with disabilities solve these problems. To produce a detection object model that fits the needs, two types of frameworks are used, namely Faster R-CNN and SSDLite for re-training using transfer learning techniques. Transfer learning is the reuse of a previously trained model on a new problem. Since the application was built to help people with low vision detect everyday objects, the selection was made for the training dataset. The dataset used is MS COCO and after selecting the suitable class, 38 of the 80 available classes are used for the training process. The training model process uses 3 different optimizers (SGD, Adam, and AdamW) for each framework and the best configuration will be chosen to be implemented in the application. The results of the transfer learning model are then assessed using the mAP (Mean Average Precision) metric. Based on the analysis of the inference results on the training server, among the 6 configurations of the model and optimizer, there are 2 configurations with the highest mAP values, namely the Faster R-CNN model with the SGD optimizer with a value of 61.3% and the SSDLite model with the AdamW optimizer with a value of 37.3%. The inference times of the two are respectively , 0.75 and 0.095 second. The inference process is then carried out again on the application server to ensure the model inference time can still be tolerated on different server specification, resulting in both models still having a tolerable inference time, which is under 5 seconds. Therefore, the configuration of the Faster R-CNN model with the SGD optimizer has been chosen which has the highest mAP value of 61.3% and an inference time of 3.6 seconds as the object detection model used in online mode.