Detection of visual attention regions in images and videos
The explosive growth of multimedia content and advances in the development of hardware with multimedia functionalities call for techniques to enable users to access such content anywhere and anytime and with similarly pleasing experience each time. This requires intelligent search, transmission, ana...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2009
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/18864 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | The explosive growth of multimedia content and advances in the development of hardware with multimedia functionalities call for techniques to enable users to access such content anywhere and anytime and with similarly pleasing experience each time. This requires intelligent search, transmission, analysis and display of multimedia data. However, in addition to the data being very large in size, it is inherently complex due to the variety of features (color, texture, shapes, motion, etc.) that it contains. The challenge then is to detect information front the clutter for further processing. The relevant information is the visual attention region (VAR) whose detection in images and videos is the topic of this dissertation.
The bottom-up model for detecting VAR in an image involves generation of a saliency map that highlights contrasts in features like color, intensity and orientation. The saliency map itself is obtained through a combination of each feature map that highlights the contrast for that particular feature. We investigate the process of good selection and proper combination strategies for the features. We propose a novel Composite Saliency Indicator (CSI) to determine the contribution of each feature map to the salient region. CS1 is designed to capture the spatial compactness as well as the density of candidate regions in the feature maps. We also propose a Context Suppression Model that provides a measure to determine similarity among candidate attention regions in a feature map. This measure is used to find a suppression factor for a particular patch in the scene, which is then used to highlight actual attention regions. We also demonstrate an application in multimedia adaptation that benefits from the improved VAR detection. |
---|