Video object search and discovery
In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. O...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/69414 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In terms of volume, videos are becoming the largest big data. The sheer volume of
video data demands powerful analytic tools to organize and make sense of them. This
thesis proposes to tackle two fundamental problems in big video analytics, i.e., search
and discovery, from an object-driven angle.
Objects that we consider are the fundamental components of a video, which are
concise, visually meaningful and informational. The mere presence of certain objects
in a video and their interactions can provide us rich information for video understanding.
In addition, they can help establish a quick impression of the video by telling
what are there, and provide a small footprint for video indexing, browsing and search.
For video object search, we aim to search for and locate a speci fic object spatio-temporally
in the video volume. The main challenges are: 1) object appearance
variations across video frames caused by pose and scale variations, partial occlusions,
etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We
propose to formulate video object search as a problem of finding the spatio-temporal
object trajectories, where an object trajectory consists of a sequence of bounding
boxes that locate the target object across frames. We also present a Max-Path search
solution that can e ffectively reduce the complexity of trajectory search from exponential
to linear to the video volume size. Furthermore, we present and evaluate the
use of object proposals to speed up matching and trajectory search. Experimental
results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal
coherency to overcome appearance variations and background clutters. In
addition, the resulting trajectories o er an alternative to frames for measuring object
occurrences and consequently the search performance. Second, the Max-Path based
trajectory search is effi cient and compatible with both dense confi dence maps and
coarsely sampled object proposals. Third, the object proposal based approach can
signi ficantly boost search effi ciency without compromising accuracy.
For video object discovery, this thesis focuses on the discovery of representative
objects from videos. We propose to address this problem by selecting representative
object proposals generated from video frames. Although representative selection
methods have been applied to video keyframe selection, directly applying them to
object-level selection faces two major challenges. First, the key objects do not necessary
locate at the densest regions in the feature space due to the appearance variations
of the same object across frames, hence, classic density based representative selection
method may not work well. Second, the irrelevant and noisy proposals in the proposal
pool may signifi cantly a ffect representative selection methods based on sparse
reconstruction. To address these challenges, we have devised a new formulation of
sparse reconstruction based representative selection that can incorporate object proposal
priors and locality prior in the feature space when selecting representatives.
Consequently it can better locate key objects and suppress outlier proposals. Although
complex constraints have been introduced, we show that the optimization
can be converted into a proximal gradient problem and be solved by the fast iterative
shrinkage thresholding algorithm (FISTA).
The proposed methods are compared against existing state-of-the-arts for object
instance search and representative object discovery on challenging datasets. It shows
that our methods can more accurately find relevant videos pertaining to an object of
interest and discover key objects that capture the essence of a video. |
---|