Video object search and discovery
In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. O...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2016
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/69414 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-69414 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Electrical and electronic engineering |
spellingShingle |
DRNTU::Engineering::Electrical and electronic engineering Meng, Jingjing Video object search and discovery |
description |
In terms of volume, videos are becoming the largest big data. The sheer volume of
video data demands powerful analytic tools to organize and make sense of them. This
thesis proposes to tackle two fundamental problems in big video analytics, i.e., search
and discovery, from an object-driven angle.
Objects that we consider are the fundamental components of a video, which are
concise, visually meaningful and informational. The mere presence of certain objects
in a video and their interactions can provide us rich information for video understanding.
In addition, they can help establish a quick impression of the video by telling
what are there, and provide a small footprint for video indexing, browsing and search.
For video object search, we aim to search for and locate a speci fic object spatio-temporally
in the video volume. The main challenges are: 1) object appearance
variations across video frames caused by pose and scale variations, partial occlusions,
etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We
propose to formulate video object search as a problem of finding the spatio-temporal
object trajectories, where an object trajectory consists of a sequence of bounding
boxes that locate the target object across frames. We also present a Max-Path search
solution that can e ffectively reduce the complexity of trajectory search from exponential
to linear to the video volume size. Furthermore, we present and evaluate the
use of object proposals to speed up matching and trajectory search. Experimental
results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal
coherency to overcome appearance variations and background clutters. In
addition, the resulting trajectories o er an alternative to frames for measuring object
occurrences and consequently the search performance. Second, the Max-Path based
trajectory search is effi cient and compatible with both dense confi dence maps and
coarsely sampled object proposals. Third, the object proposal based approach can
signi ficantly boost search effi ciency without compromising accuracy.
For video object discovery, this thesis focuses on the discovery of representative
objects from videos. We propose to address this problem by selecting representative
object proposals generated from video frames. Although representative selection
methods have been applied to video keyframe selection, directly applying them to
object-level selection faces two major challenges. First, the key objects do not necessary
locate at the densest regions in the feature space due to the appearance variations
of the same object across frames, hence, classic density based representative selection
method may not work well. Second, the irrelevant and noisy proposals in the proposal
pool may signifi cantly a ffect representative selection methods based on sparse
reconstruction. To address these challenges, we have devised a new formulation of
sparse reconstruction based representative selection that can incorporate object proposal
priors and locality prior in the feature space when selecting representatives.
Consequently it can better locate key objects and suppress outlier proposals. Although
complex constraints have been introduced, we show that the optimization
can be converted into a proximal gradient problem and be solved by the fast iterative
shrinkage thresholding algorithm (FISTA).
The proposed methods are compared against existing state-of-the-arts for object
instance search and representative object discovery on challenging datasets. It shows
that our methods can more accurately find relevant videos pertaining to an object of
interest and discover key objects that capture the essence of a video. |
author2 |
Tan Yap Peng |
author_facet |
Tan Yap Peng Meng, Jingjing |
format |
Theses and Dissertations |
author |
Meng, Jingjing |
author_sort |
Meng, Jingjing |
title |
Video object search and discovery |
title_short |
Video object search and discovery |
title_full |
Video object search and discovery |
title_fullStr |
Video object search and discovery |
title_full_unstemmed |
Video object search and discovery |
title_sort |
video object search and discovery |
publishDate |
2016 |
url |
https://hdl.handle.net/10356/69414 |
_version_ |
1772827818389929984 |
spelling |
sg-ntu-dr.10356-694142023-07-04T16:14:05Z Video object search and discovery Meng, Jingjing Tan Yap Peng School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. Objects that we consider are the fundamental components of a video, which are concise, visually meaningful and informational. The mere presence of certain objects in a video and their interactions can provide us rich information for video understanding. In addition, they can help establish a quick impression of the video by telling what are there, and provide a small footprint for video indexing, browsing and search. For video object search, we aim to search for and locate a speci fic object spatio-temporally in the video volume. The main challenges are: 1) object appearance variations across video frames caused by pose and scale variations, partial occlusions, etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We propose to formulate video object search as a problem of finding the spatio-temporal object trajectories, where an object trajectory consists of a sequence of bounding boxes that locate the target object across frames. We also present a Max-Path search solution that can e ffectively reduce the complexity of trajectory search from exponential to linear to the video volume size. Furthermore, we present and evaluate the use of object proposals to speed up matching and trajectory search. Experimental results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal coherency to overcome appearance variations and background clutters. In addition, the resulting trajectories o er an alternative to frames for measuring object occurrences and consequently the search performance. Second, the Max-Path based trajectory search is effi cient and compatible with both dense confi dence maps and coarsely sampled object proposals. Third, the object proposal based approach can signi ficantly boost search effi ciency without compromising accuracy. For video object discovery, this thesis focuses on the discovery of representative objects from videos. We propose to address this problem by selecting representative object proposals generated from video frames. Although representative selection methods have been applied to video keyframe selection, directly applying them to object-level selection faces two major challenges. First, the key objects do not necessary locate at the densest regions in the feature space due to the appearance variations of the same object across frames, hence, classic density based representative selection method may not work well. Second, the irrelevant and noisy proposals in the proposal pool may signifi cantly a ffect representative selection methods based on sparse reconstruction. To address these challenges, we have devised a new formulation of sparse reconstruction based representative selection that can incorporate object proposal priors and locality prior in the feature space when selecting representatives. Consequently it can better locate key objects and suppress outlier proposals. Although complex constraints have been introduced, we show that the optimization can be converted into a proximal gradient problem and be solved by the fast iterative shrinkage thresholding algorithm (FISTA). The proposed methods are compared against existing state-of-the-arts for object instance search and representative object discovery on challenging datasets. It shows that our methods can more accurately find relevant videos pertaining to an object of interest and discover key objects that capture the essence of a video. ELECTRICAL and ELECTRONIC ENGINEERING 2016-12-28T07:32:47Z 2016-12-28T07:32:47Z 2016 Thesis Meng, J. (2016). Video object search and discovery. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/69414 10.32657/10356/69414 en 164 p. application/pdf |