PANOPTIC SEGMENTATION PERCEPTION SYSTEM FOR AUTONOMOUS ELECTRIC TRAM IN THE MIXED TRAFFIC ENVIRONMENT
Semantic segmentation's objective is to recognize all objects from an image without concerning the number of objects from the same class. On the other hand, instance segmentation emphasizes the number of foreground entities. Panoptic segmentation combines two contrasting paradigms of semanti...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/70809 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Semantic segmentation's objective is to recognize all objects from an image without concerning
the number of objects from the same class. On the other hand, instance segmentation
emphasizes the number of foreground entities. Panoptic segmentation combines two
contrasting paradigms of semantic segmentation and instance segmentation into a single
architecture to provide a whole unifying segmentation.
The research on panoptic segmentation for autonomous trams in a mixed traffic environment
requires a dataset that represents the real-world scenario. Unfortunately, this type of dataset
in the common public dataset is quite scarce. This research centered on developing panoptic
segmentation for autonomous trams, with a novelty in dataset creation for the mixed-traffic
environment. This thesis consists of four consecutive developments: evaluation of the public
datasets, data recording, data annotation, and model training.
The goal of evaluating Cityscapes, COCO, and RailSem19 datasets is to find compatibility for
the local context. Inference of some local data using a model trained on those datasets shows
some failure during recognizing some unique objects. The specification for the local panoptic
dataset is then derived from what was gotten during the evaluation. The local datasets consist
of 22 object categories, in which 17 categories resemble instance objects, and 5 categories
resemble semantic objects.
Sensor recording of mixed traffic environment is taken on Jl. Slamet Riyadi, Kota Solo. The
process utilizes 4 Sekonix cameras connected to the Nvidia DRIVE AGX Pegasus development
kit. This activity resulting some data sequences 120 minutes in length for each camera.
Data annotations are proceeded by giving polygon labels to every object in the recorded data.
The annotation team produces 1000 data frames that have passed quality control. The result of
post-processing on the annotated images is datasets with a COCO-like format and in a
nuScenes-like scenario.
Training the panoptic segmentation model is conducted to verify the produced datasets as a
source of knowledge. Training is executed on Panoptic-FPN architecture using the
MMDetection framework. The qualitative evaluation shows a fair result, indicated by the
decent segmentation of some unique objects. The quantitative analysis shows 69% Recognition
Quality (RQ), 140.6% Segmentation Quality (SQ), and 110.6% Panoptic Quality (PQ). The
presence of double annotations causes an incorrect calculation of intersection-over-union
(IoU), which has consequences on the spike of SQ and PQ calculation. |
---|