Human motion tracking using deep learning

Tracking moving objects like pedestrian have a wide range of applications for intelligent autonomous vehicle. While object detection using modern deep learning-based approach such as Convolutional Neural Network (CNN) have advanced significantly, autonomous system such as ground mobile robots still...

Full description

Saved in:
Bibliographic Details
Main Author: Peok, Qing Xiang
Other Authors: Soong Boon Hee
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139214
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Tracking moving objects like pedestrian have a wide range of applications for intelligent autonomous vehicle. While object detection using modern deep learning-based approach such as Convolutional Neural Network (CNN) have advanced significantly, autonomous system such as ground mobile robots still face problems in real-time implementation. These methods include CNN based human tracking due to occlusions, illumination changes and camera view variations. The aim of this project is to identify the optimal deep learning algorithm, in terms of speed and accuracy. Human tracking by detection using selected real-time object detector is also implemented. Firstly, in terms of performance speed, in terms of Frames per Second (FPS), One-stage Detectors, Single Shot Detector (SSD) achieve 19 FPS, You Only Look Once (YOLOv3) achieve 22 FPS as compared to Two-stage Detectors, such as Mask Region-Convolutional Neural Network (Mask R-CNN) which achieve 1 FPS, with this, it is concluded that One-stage Detectors is more suitable for real time implementation. In addition, Multiple Object Tracking Precision (MOTP) is used as a measure for detection accuracy, it is found that SSD achieved 68.4%, YOLOv3 achieved 72.9% and Mask R-CNN achieved 74.0%. Thus, YOLOv3 is used in this project as it is the optimal object detector. Next, the detection output coordinates as bounding boxes (in pixels) from the YOLOv3 is passed to the tracking algorithm that associate detections over frames to create a trajectory of any tracked person. The decision rule to associate detection over frames is computed by Hungarian Algorithm (Kuhn-Munkres) that takes on motion affinity and appearance similarity as inputs. For motion affinity, Intersection Over Union (IOU) provides information on physical proximity between the new detections and the tracked persons. The appearance similarity is measured by computing Euclidean distance between the last CNN features of the detected person. Promising results were obtained in the various tests carried out. Finally, the tracking algorithm is tested on mobile robot with Robot Operating System (ROS) framework, with images captured from Orbbec Astra RGB-D camera which is passed into object detection algorithm. The tracking is performed subsequently in real-time. Two major challenges faced during testing were ID switching and missing detections. While the latter is limited by the detection algorithm, ID switch occurs when two similar looking persons are very close and tracking algorithm are unable to differentiate them.