VEHICLE SPEED ESTIMATION USING YOLO, KALMAN FILTER, AND FRAME SAMPLING
Vehicle speed estimation based on video feed can be used to enforce road rules and give traffic insights without the need to interfere with physical road components. Common methods are background subtraction, motion detection, and/or convolutional neural network (CNN). The first two methods suffer f...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/55847 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Vehicle speed estimation based on video feed can be used to enforce road rules and give traffic insights without the need to interfere with physical road components. Common methods are background subtraction, motion detection, and/or convolutional neural network (CNN). The first two methods suffer from inability to differentiate classes and occlusion, whereas CNN suffer from computational complexity. In this thesis, an architectural pipeline based on TensorRT-optimized You Only Look Once (YOLO), Kalman filter tracker, and optimization using frame sampling is proposed. YOLO is a state-of-the-art with good tradeoff accuracy and performance. Non-YOLO layers are represented in ONNX format and then optimized by Tensor-RT framework. Kalman filter, though simplistic in nature, benefits from the predictive movement model of vehicles, hence performs faster than CNN-based trackers. Frame sampling is utilized by processing lower framerate video without having to remove important features to detect and track objects. Novel method for selecting object reference point by drawing line from vanishing point to the center of bounding box until it reaches the edge will also be explored.
From experiments, it is found that TensorRT improves FPS by 3.7x with mAP degradation of only 0.02. YOLO TRT-Kalman architecture with best configuration achieves an MAE (0.96 km/h) and acceptable error interval (93.81%) with better result in real-time (118 FPS) than the baseline Mask-RCNN-DeepSORT architecture with MAE of 1.12 km/h and acceptable error of 90.21% at 3 FPS. Frame sampling can be used to further improve the FPS, with 1/5 sampling ratio improves the speed by 50% (to 177 FPS) with only 0.11 km/h MAE tradeoff to 1.07 km/h. Additionally, it is found that increasing network size from identical weights does not guarantee better performance (MAE went up from 0.96 to 1.21 km/h), and the proposed reference point selection is proven to decrease accuracy. |
---|