Video object search and discovery

In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. O...

Full description

Saved in:
Bibliographic Details
Main Author: Meng, Jingjing
Other Authors: Tan Yap Peng
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/69414
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-69414
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Electrical and electronic engineering
spellingShingle DRNTU::Engineering::Electrical and electronic engineering
Meng, Jingjing
Video object search and discovery
description In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. Objects that we consider are the fundamental components of a video, which are concise, visually meaningful and informational. The mere presence of certain objects in a video and their interactions can provide us rich information for video understanding. In addition, they can help establish a quick impression of the video by telling what are there, and provide a small footprint for video indexing, browsing and search. For video object search, we aim to search for and locate a speci fic object spatio-temporally in the video volume. The main challenges are: 1) object appearance variations across video frames caused by pose and scale variations, partial occlusions, etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We propose to formulate video object search as a problem of finding the spatio-temporal object trajectories, where an object trajectory consists of a sequence of bounding boxes that locate the target object across frames. We also present a Max-Path search solution that can e ffectively reduce the complexity of trajectory search from exponential to linear to the video volume size. Furthermore, we present and evaluate the use of object proposals to speed up matching and trajectory search. Experimental results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal coherency to overcome appearance variations and background clutters. In addition, the resulting trajectories o er an alternative to frames for measuring object occurrences and consequently the search performance. Second, the Max-Path based trajectory search is effi cient and compatible with both dense confi dence maps and coarsely sampled object proposals. Third, the object proposal based approach can signi ficantly boost search effi ciency without compromising accuracy. For video object discovery, this thesis focuses on the discovery of representative objects from videos. We propose to address this problem by selecting representative object proposals generated from video frames. Although representative selection methods have been applied to video keyframe selection, directly applying them to object-level selection faces two major challenges. First, the key objects do not necessary locate at the densest regions in the feature space due to the appearance variations of the same object across frames, hence, classic density based representative selection method may not work well. Second, the irrelevant and noisy proposals in the proposal pool may signifi cantly a ffect representative selection methods based on sparse reconstruction. To address these challenges, we have devised a new formulation of sparse reconstruction based representative selection that can incorporate object proposal priors and locality prior in the feature space when selecting representatives. Consequently it can better locate key objects and suppress outlier proposals. Although complex constraints have been introduced, we show that the optimization can be converted into a proximal gradient problem and be solved by the fast iterative shrinkage thresholding algorithm (FISTA). The proposed methods are compared against existing state-of-the-arts for object instance search and representative object discovery on challenging datasets. It shows that our methods can more accurately find relevant videos pertaining to an object of interest and discover key objects that capture the essence of a video.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Meng, Jingjing
format Theses and Dissertations
author Meng, Jingjing
author_sort Meng, Jingjing
title Video object search and discovery
title_short Video object search and discovery
title_full Video object search and discovery
title_fullStr Video object search and discovery
title_full_unstemmed Video object search and discovery
title_sort video object search and discovery
publishDate 2016
url https://hdl.handle.net/10356/69414
_version_ 1772827818389929984
spelling sg-ntu-dr.10356-694142023-07-04T16:14:05Z Video object search and discovery Meng, Jingjing Tan Yap Peng School of Electrical and Electronic Engineering DRNTU::Engineering::Electrical and electronic engineering In terms of volume, videos are becoming the largest big data. The sheer volume of video data demands powerful analytic tools to organize and make sense of them. This thesis proposes to tackle two fundamental problems in big video analytics, i.e., search and discovery, from an object-driven angle. Objects that we consider are the fundamental components of a video, which are concise, visually meaningful and informational. The mere presence of certain objects in a video and their interactions can provide us rich information for video understanding. In addition, they can help establish a quick impression of the video by telling what are there, and provide a small footprint for video indexing, browsing and search. For video object search, we aim to search for and locate a speci fic object spatio-temporally in the video volume. The main challenges are: 1) object appearance variations across video frames caused by pose and scale variations, partial occlusions, etc., 2) false positives introduced by background clutters, and 3) search e fficiency. We propose to formulate video object search as a problem of finding the spatio-temporal object trajectories, where an object trajectory consists of a sequence of bounding boxes that locate the target object across frames. We also present a Max-Path search solution that can e ffectively reduce the complexity of trajectory search from exponential to linear to the video volume size. Furthermore, we present and evaluate the use of object proposals to speed up matching and trajectory search. Experimental results demonstrate three benefi ts of the proposed approaches. First, the formulation as trajectory search can eff ectively improve matching accuracy by enforcing spatio-temporal coherency to overcome appearance variations and background clutters. In addition, the resulting trajectories o er an alternative to frames for measuring object occurrences and consequently the search performance. Second, the Max-Path based trajectory search is effi cient and compatible with both dense confi dence maps and coarsely sampled object proposals. Third, the object proposal based approach can signi ficantly boost search effi ciency without compromising accuracy. For video object discovery, this thesis focuses on the discovery of representative objects from videos. We propose to address this problem by selecting representative object proposals generated from video frames. Although representative selection methods have been applied to video keyframe selection, directly applying them to object-level selection faces two major challenges. First, the key objects do not necessary locate at the densest regions in the feature space due to the appearance variations of the same object across frames, hence, classic density based representative selection method may not work well. Second, the irrelevant and noisy proposals in the proposal pool may signifi cantly a ffect representative selection methods based on sparse reconstruction. To address these challenges, we have devised a new formulation of sparse reconstruction based representative selection that can incorporate object proposal priors and locality prior in the feature space when selecting representatives. Consequently it can better locate key objects and suppress outlier proposals. Although complex constraints have been introduced, we show that the optimization can be converted into a proximal gradient problem and be solved by the fast iterative shrinkage thresholding algorithm (FISTA). The proposed methods are compared against existing state-of-the-arts for object instance search and representative object discovery on challenging datasets. It shows that our methods can more accurately find relevant videos pertaining to an object of interest and discover key objects that capture the essence of a video. ELECTRICAL and ELECTRONIC ENGINEERING 2016-12-28T07:32:47Z 2016-12-28T07:32:47Z 2016 Thesis Meng, J. (2016). Video object search and discovery. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/69414 10.32657/10356/69414 en 164 p. application/pdf