Part-based visual tracking with graphs

Visual tracking is a fundamental task in computer vision area, which aims to locate the target object in a video sequence given only the location in the first frame. It is widely adopted in many applications such as video surveillance, robotics, and autonomous driving. Visual tracking is challenging...

Full description

Saved in:
Bibliographic Details
Main Author: Han, Wei
Other Authors: Mao Kezhi
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/157196
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Visual tracking is a fundamental task in computer vision area, which aims to locate the target object in a video sequence given only the location in the first frame. It is widely adopted in many applications such as video surveillance, robotics, and autonomous driving. Visual tracking is challenging due to the unconstrained real-world scenarios such as large deformation, occlusion, illumination change and cluttered background. Global appearance-based trackers are less effective in handling large deformations and occlusions. To this end, many works attempt to use part-based models to represent the object with local appearance and use graphs to model the relations among parts. This thesis aims to exploit part-based tracking methods with graph models to settle the issues in existing works and improve the tracking performance. The first work of this thesis presents a part-based tracking algorithm under a graph learning framework. Most existing part-based trackers assisted by graph model explicitly design the graph similarity matrix to encode multiple constraints, which might be suboptimal. In this work, we propose to directly solve for the optimal similarity matrix with multiple constraints, including motion, appearance and geometric constraints. In addition, the optimal similarity matrix achieves target/background part separation and target part matching simultaneously, which solves the problem caused by sequentially separating and matching target parts in existing works. The second work of this thesis investigates tracking with two different types of parts that complement each other. Most existing part-based trackers, including our first work, use either discriminative rectangular patch or superpixel as part. However, the superpixel parts are favorable for precisely separating target foreground parts from background in the same frame, while the distinctive patches are advantageous for inter-frame part association. Therefore in this work, we exploit superpixel for part separation and distinctive patches for part matching in a unified energy minimization framework. A heterogeneous graph is constructed to facilitate information exchange among different parts. The third work of this thesis focuses on improving the part discriminative model trained on superpixels for more robust tracking performance under large appearance variation and cluttered background. In our second work and many existing superpixel-based trackers, tracking is formulated as a superpixel labeling problem constrained by a target likelihood constraint, a spatial smoothness constraint and a temporal consistency constraint. The target likelihood is calculated with a discriminative appearance model trained independently from the optimization framework of superpixel labeling. Due to the lack of spatial and temporal constraints and inaccurate pseudo-labels, the discriminative model is erroneous, leading to tracking failure. In this work, the discriminative model learning and the superpixel labeling are integrated into the same objective function. Thus, during the optimization process, the discriminative model can be constrained by the spatial and temporal constraints and provides more accurate target likelihood for part labeling, and the labels produce more reliable pseudo-labels for the model learning. In the fourth work of this thesis, a part-based strategy to directly locate target parts is exploited in an end-to-end deep learning framework for visual tracking. Our first three works all focus on part-based tracking with traditional methods. However, with more and more large-scale tracking datasets available, deep learning-based trackers have become the prevailing methods and outperform traditional methods by a large margin. Most existing deep trackers adopt global target representation, while part-based tracking under deep learning framework is underexplored. In this work, we propose a deep tracker that locates each target part individually from the part representation and estimates the target states with the parts distribution. Since the ground truth part locations are unknown for training, we introduce a novel attention-guided learning strategy to learn more reasonable part location prediction. Moreover, to handle object appearance variation over time, an efficient part updating module is integrated into the tracking network.