Discovering thematic visual objects in unconstrained videos
Over the last decade, with the popularization of camera-equipped devices, there has been an explosive growth of video data. Despite the diverse visual contents, there are usually some thematic objects in these videos. As the key objects to be presented, thematic objects appear frequently and occ...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2018
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/75713 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Over the last decade, with the popularization of camera-equipped devices,
there has been an explosive growth of video data. Despite the diverse
visual contents, there are usually some thematic objects in these videos. As
the key objects to be presented, thematic objects appear frequently and
occupy highlighted positions in the video scenes, thus retain our impression
after watching the videos, such as the bride and the groom in wedding
ceremony videos, the birthday girl in birthday party videos, or product
logo in commercial videos. Automatically discovering and localizing these
thematic objects can benefit many real-world applications, such as video
summarization, search, and labeling. However, this task is challenging as
there is no prior information or initialization about the thematic objects.
Moreover, there is usually background clutter, occlusions, or camera motions
accompanying the targets. In this thesis, a systematic study is conducted on
the automatic discovery and localization of thematic objects in videos.
We have studied this problem under various settings, including automatic
discovery and localization of the thematic object in single videos, automatic
discovery and segmentation of the thematic object in single videos, and automatic
thematic action discovery and localization in collections of videos. In
the absence of category-specific supervision and manual initialization, various
category-independent cues have been explored to discover and localize the
thematic objects. These include the spatiotemporal saliency to highlight
regions with salient appearance or motion with respect to the background,
temporal smoothness of spatial locations and appearance variations along
the object moving trajectory, and global appearance consistency of the object
throughout its presence. When the discovery is performed in video collections
instead of single videos, the semantic similarities in terms of appearance
and/or motion patterns of the objects between different videos are also important.
Novel techniques are proposed in this thesis to improve the reliability
and efficiency of these cues as well as how they can be better explored to
improve the discovery and localization performance. Extensive evaluations
on both benchmarking as well as newly proposed datasets demonstrate the
usefulness of these proposed methods as well as their superiority over existing
approaches. |
---|