Moving-object detection, association, and selection in home videos
Due to the prevalence of digital video camcorders, home videos have become an important part of life-logs of personal experiences. To enable efficient video parsing, a critical step is to automatically extract objects, events and scene characteristics present in videos. This paper addresses the prob...
Saved in:
Main Authors: | , |
---|---|
Format: | text |
Language: | English |
Published: |
Institutional Knowledge at Singapore Management University
2007
|
Subjects: | |
Online Access: | https://ink.library.smu.edu.sg/sis_research/6332 https://ink.library.smu.edu.sg/context/sis_research/article/7335/viewcontent/10.1.1.495.1914.pdf |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Singapore Management University |
Language: | English |
Summary: | Due to the prevalence of digital video camcorders, home videos have become an important part of life-logs of personal experiences. To enable efficient video parsing, a critical step is to automatically extract objects, events and scene characteristics present in videos. This paper addresses the problem of extracting objects from home videos. Automatic detection of objects is a classical yet difficult vision problem, particularly for videos with complex scenes and unrestricted domains. Compared with edited and surveillant videos, home videos captured in uncontrolled environment are usually coupled with several notable features such as shaking artifacts, irregular motions, and arbitrary settings. These characteristics have actually prohibited the effective parsing of semantic video content using conventional vision analysis. In this paper, we propose a new approach to automatically locate multiple objects in home videos, by taking into account of how and when to initialize objects. Previous approaches mostly consider the problem of how but not when due to the efficiency or real-time requirements. In home-video indexing, online processing is optional. By considering when, some difficult problems can be alleviated, and most importantly, enlightens the possibility of parsing semantic video objects. In our proposed approach, the how part is formulated as an object detection and association problem, while the when part is a saliency measurement to determine the best few locations to start multiple object initialization. |
---|