Video object search and discovery

In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. O...

Full description

Saved in:

Bibliographic Details
Main Author:	Meng, Jingjing
Other Authors:	Tan Yap Peng
Format:	Theses and Dissertations
Language:	English
Published:	2016
Subjects:	DRNTU::Engineering::Electrical and electronic engineering
Online Access:	https://hdl.handle.net/10356/69414
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-69414
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Electrical and electronic engineering
spellingShingle	DRNTU::Engineering::Electrical and electronic engineering Meng, Jingjing Video object search and discovery
description	In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. Objects that we consider are the fundamental components of a video, which are concise, visually meaningful and informational. The mere presence of certain objects in a video and their interactions can provide us rich information for video understanding. In addition, they can help establish a quick impression of the video by telling what are there, and provide a small footprint for video indexing, browsing and search. For video object search, we aim to search for and locate a speci fic object spatio-temporally in the video volume. The main challenges are: 1) object appearance variations across video frames caused by pose and scale variations, partial occlusions, etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We propose to formulate video object search as a problem of finding the spatio-temporal object trajectories, where an object trajectory consists of a sequence of bounding boxes that locate the target object across frames. We also present a Max-Path search solution that can e ffectively reduce the complexity of trajectory search from exponential to linear to the video volume size. Furthermore, we present and evaluate the use of object proposals to speed up matching and trajectory search. Experimental results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal coherency to overcome appearance variations and background clutters. In addition, the resulting trajectories o er an alternative to frames for measuring object occurrences and consequently the search performance. Second, the Max-Path based trajectory search is effi cient and compatible with both dense confi dence maps and coarsely sampled object proposals. Third, the object proposal based approach can signi ficantly boost search effi ciency without compromising accuracy. For video object discovery, this thesis focuses on the discovery of representative objects from videos. We propose to address this problem by selecting representative object proposals generated from video frames. Although representative selection methods have been applied to video keyframe selection, directly applying them to object-level selection faces two major challenges. First, the key objects do not necessary locate at the densest regions in the feature space due to the appearance variations of the same object across frames, hence, classic density based representative selection method may not work well. Second, the irrelevant and noisy proposals in the proposal pool may signifi cantly a ffect representative selection methods based on sparse reconstruction. To address these challenges, we have devised a new formulation of sparse reconstruction based representative selection that can incorporate object proposal priors and locality prior in the feature space when selecting representatives. Consequently it can better locate key objects and suppress outlier proposals. Although complex constraints have been introduced, we show that the optimization can be converted into a proximal gradient problem and be solved by the fast iterative shrinkage thresholding algorithm (FISTA). The proposed methods are compared against existing state-of-the-arts for object instance search and representative object discovery on challenging datasets. It shows that our methods can more accurately find relevant videos pertaining to an object of interest and discover key objects that capture the essence of a video.
author2	Tan Yap Peng
author_facet	Tan Yap Peng Meng, Jingjing
format	Theses and Dissertations
author	Meng, Jingjing
author_sort	Meng, Jingjing
title	Video object search and discovery
title_short	Video object search and discovery
title_full	Video object search and discovery
title_fullStr	Video object search and discovery
title_full_unstemmed	Video object search and discovery
title_sort	video object search and discovery
publishDate	2016
url	https://hdl.handle.net/10356/69414
_version_	1772827818389929984
spelling	sg-ntu-dr.10356-694142023-07-04T16:14:05Z Video object search and discovery Meng, Jingjing Tan Yap Peng School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. Objects that we consider are the fundamental components of a video, which are concise, visually meaningful and informational. The mere presence of certain objects in a video and their interactions can provide us rich information for video understanding. In addition, they can help establish a quick impression of the video by telling what are there, and provide a small footprint for video indexing, browsing and search. For video object search, we aim to search for and locate a speci fic object spatio-temporally in the video volume. The main challenges are: 1) object appearance variations across video frames caused by pose and scale variations, partial occlusions, etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We propose to formulate video object search as a problem of finding the spatio-temporal object trajectories, where an object trajectory consists of a sequence of bounding boxes that locate the target object across frames. We also present a Max-Path search solution that can e ffectively reduce the complexity of trajectory search from exponential to linear to the video volume size. Furthermore, we present and evaluate the use of object proposals to speed up matching and trajectory search. Experimental results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal coherency to overcome appearance variations and background clutters. In addition, the resulting trajectories o er an alternative to frames for measuring object occurrences and consequently the search performance. Second, the Max-Path based trajectory search is effi cient and compatible with both dense confi dence maps and coarsely sampled object proposals. Third, the object proposal based approach can signi ficantly boost search effi ciency without compromising accuracy. For video object discovery, this thesis focuses on the discovery of representative objects from videos. We propose to address this problem by selecting representative object proposals generated from video frames. Although representative selection methods have been applied to video keyframe selection, directly applying them to object-level selection faces two major challenges. First, the key objects do not necessary locate at the densest regions in the feature space due to the appearance variations of the same object across frames, hence, classic density based representative selection method may not work well. Second, the irrelevant and noisy proposals in the proposal pool may signifi cantly a ffect representative selection methods based on sparse reconstruction. To address these challenges, we have devised a new formulation of sparse reconstruction based representative selection that can incorporate object proposal priors and locality prior in the feature space when selecting representatives. Consequently it can better locate key objects and suppress outlier proposals. Although complex constraints have been introduced, we show that the optimization can be converted into a proximal gradient problem and be solved by the fast iterative shrinkage thresholding algorithm (FISTA). The proposed methods are compared against existing state-of-the-arts for object instance search and representative object discovery on challenging datasets. It shows that our methods can more accurately find relevant videos pertaining to an object of interest and discover key objects that capture the essence of a video. ELECTRICAL and ELECTRONIC ENGINEERING 2016-12-28T07:32:47Z 2016-12-28T07:32:47Z 2016 Thesis Meng, J. (2016). Video object search and discovery. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/69414 10.32657/10356/69414 en 164 p. application/pdf

Video object search and discovery

Similar Items