FASTER R-CNN AND SSDLITE MODEL IMPLEMENTATION FOR OBJECT DETECTION APPLICATION FOR LOW VISION USERS
Low vision is a condition where a person's visual function decreases permanently, it makes the sufferer difficult to carry out daily activities. One of the problems felt by people with low vision is the difficulty in finding objects. Mobile applications with object detection features are bui...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/69186 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Low vision is a condition where a person's visual function decreases
permanently, it makes the sufferer difficult to carry out daily activities. One of the
problems felt by people with low vision is the difficulty in finding objects. Mobile
applications with object detection features are built to help people with disabilities
solve these problems.
To produce a detection object model that fits the needs, two types of
frameworks are used, namely Faster R-CNN and SSDLite for re-training using
transfer learning techniques. Transfer learning is the reuse of a previously trained
model on a new problem. Since the application was built to help people with low
vision detect everyday objects, the selection was made for the training dataset. The
dataset used is MS COCO and after selecting the suitable class, 38 of the 80
available classes are used for the training process. The training model process uses
3 different optimizers (SGD, Adam, and AdamW) for each framework and the best
configuration will be chosen to be implemented in the application. The results of
the transfer learning model are then assessed using the mAP (Mean Average
Precision) metric.
Based on the analysis of the inference results on the training server, among
the 6 configurations of the model and optimizer, there are 2 configurations with the
highest mAP values, namely the Faster R-CNN model with the SGD optimizer with
a value of 61.3% and the SSDLite model with the AdamW optimizer with a value
of 37.3%. The inference times of the two are respectively , 0.75 and 0.095 second.
The inference process is then carried out again on the application server to
ensure the model inference time can still be tolerated on different server
specification, resulting in both models still having a tolerable inference time, which
is under 5 seconds. Therefore, the configuration of the Faster R-CNN model with
the SGD optimizer has been chosen which has the highest mAP value of 61.3% and
an inference time of 3.6 seconds as the object detection model used in online mode. |
---|