Exploring effective data representation for saliency detection in image and video

Visual saliency plays an important role in many applications, such as image/video retargeting, automatic photo composition, vision-based navigation, etc. Visual saliency can guide these applications to only focus on the important regions, thus reduce the complexity of scene analysis. However, curren...

Full description

Saved in:

Bibliographic Details
Main Author:	Ren, Zhixiang
Other Authors:	Chia Liang Tien
Format:	Theses and Dissertations
Language:	English
Published:	2014
Subjects:	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Online Access:	http://hdl.handle.net/10356/55428
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-55428
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle	DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Ren, Zhixiang Exploring effective data representation for saliency detection in image and video
description	Visual saliency plays an important role in many applications, such as image/video retargeting, automatic photo composition, vision-based navigation, etc. Visual saliency can guide these applications to only focus on the important regions, thus reduce the complexity of scene analysis. However, current saliency detection methods generate saliency maps with low resolution or quality, which may not satisfy the requirements of some applications. Moreover, compared with a large amount of research efforts on static images, the saliency models for videos are less well established. In this thesis, we study and propose several models to detect salient objects or regions in images and videos. To address the low resolution problem of saliency maps, we improve the current clustering framework by introducing a two-level clustering strategy based on the complexity of images. We first use the adaptive mean shift algorithm to extract superpixels from the input image, then employ the Gaussian Mixture Model (GMM) to group superpixels based on their appearance similarity. The saliency value is finally calculated for each cluster using compactness metric together with modified PageRank propagation. With the superpixel representation and saliency refinement, this region-based method represents the input image in a perceptually meaningful way and highlights salient regions with full resolution and well-defined boundary. The application of our saliency maps in object recognition shows the potential of the proposed method. For video saliency detection, motivated by the psychological findings that human visual system is extremely sensitive to isolated abrupt stimulus and relative movement, we formulate the saliency detection problem as an unified feature reconstruction problem. For temporal saliency, we use patches in neighboring frames to sparsely reconstruct the target patch in the current frame. We measure the temporal saliency of a patch based on its abruptness, which is estimated by the reconstruction error as well as regularizer, and its motion contrast calculated as the difference of reconstruction coefficients. For spatial saliency, we use the surrounding patches in the same frame to sparsely reconstruct the center patch. The reconstruction error and regularizer are used to measure the local center-surround contrast for spatial saliency detection. The excellent performance of our feature reconstruction in both image and video evaluations justifies the plausibility of feature reconstruction as an explanation for visual saliency. The sparse and low-rank representation demonstrates great potential in subspace learning. For different camera motion, we develop different video saliency detection models based on this powerful technique. With respect to moderate camera motion, we jointly estimate the salient foreground motion and the camera motion via robust alignment with sparse and low-rank decomposition. Consecutive frames are transformed and aligned, and then decomposed to a low-rank matrix representing the background and a sparse matrix indicating the objects with salient motion. We also incorporate useful spatial information including global rarity, local center-surround contrast and location priority, into our model to comprehensively detect spatiotemporal saliency. With regards to large camera motion, our alignment-based model may fail to detect moving objects. Thus we propose to use trajectory representation in the sparse and low-rank decomposition for videos with large camera motion. Under the assumption of orthographic projection, the trajectories from background lie in a subspace spanned by three basis trajectories, i.e. the rank of the background matrix is 3. We estimate the compact background model based on this rank constraint. Furthermore, to enforce the spatial connectivity and motion coherency constraint, a Markov Random Field (MRF) is built for foreground estimation. This model is evaluated on a set of challenging sequences and shows superior performance compared to several state-of-the-art methods.
author2	Chia Liang Tien
author_facet	Chia Liang Tien Ren, Zhixiang
format	Theses and Dissertations
author	Ren, Zhixiang
author_sort	Ren, Zhixiang
title	Exploring effective data representation for saliency detection in image and video
title_short	Exploring effective data representation for saliency detection in image and video
title_full	Exploring effective data representation for saliency detection in image and video
title_fullStr	Exploring effective data representation for saliency detection in image and video
title_full_unstemmed	Exploring effective data representation for saliency detection in image and video
title_sort	exploring effective data representation for saliency detection in image and video
publishDate	2014
url	http://hdl.handle.net/10356/55428
_version_	1759854480858284032
spelling	sg-ntu-dr.10356-554282023-03-04T00:34:46Z Exploring effective data representation for saliency detection in image and video Ren, Zhixiang Chia Liang Tien School of Computer Engineering Centre for Multimedia and Network Technology DRNTU::Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Visual saliency plays an important role in many applications, such as image/video retargeting, automatic photo composition, vision-based navigation, etc. Visual saliency can guide these applications to only focus on the important regions, thus reduce the complexity of scene analysis. However, current saliency detection methods generate saliency maps with low resolution or quality, which may not satisfy the requirements of some applications. Moreover, compared with a large amount of research efforts on static images, the saliency models for videos are less well established. In this thesis, we study and propose several models to detect salient objects or regions in images and videos. To address the low resolution problem of saliency maps, we improve the current clustering framework by introducing a two-level clustering strategy based on the complexity of images. We first use the adaptive mean shift algorithm to extract superpixels from the input image, then employ the Gaussian Mixture Model (GMM) to group superpixels based on their appearance similarity. The saliency value is finally calculated for each cluster using compactness metric together with modified PageRank propagation. With the superpixel representation and saliency refinement, this region-based method represents the input image in a perceptually meaningful way and highlights salient regions with full resolution and well-defined boundary. The application of our saliency maps in object recognition shows the potential of the proposed method. For video saliency detection, motivated by the psychological findings that human visual system is extremely sensitive to isolated abrupt stimulus and relative movement, we formulate the saliency detection problem as an unified feature reconstruction problem. For temporal saliency, we use patches in neighboring frames to sparsely reconstruct the target patch in the current frame. We measure the temporal saliency of a patch based on its abruptness, which is estimated by the reconstruction error as well as regularizer, and its motion contrast calculated as the difference of reconstruction coefficients. For spatial saliency, we use the surrounding patches in the same frame to sparsely reconstruct the center patch. The reconstruction error and regularizer are used to measure the local center-surround contrast for spatial saliency detection. The excellent performance of our feature reconstruction in both image and video evaluations justifies the plausibility of feature reconstruction as an explanation for visual saliency. The sparse and low-rank representation demonstrates great potential in subspace learning. For different camera motion, we develop different video saliency detection models based on this powerful technique. With respect to moderate camera motion, we jointly estimate the salient foreground motion and the camera motion via robust alignment with sparse and low-rank decomposition. Consecutive frames are transformed and aligned, and then decomposed to a low-rank matrix representing the background and a sparse matrix indicating the objects with salient motion. We also incorporate useful spatial information including global rarity, local center-surround contrast and location priority, into our model to comprehensively detect spatiotemporal saliency. With regards to large camera motion, our alignment-based model may fail to detect moving objects. Thus we propose to use trajectory representation in the sparse and low-rank decomposition for videos with large camera motion. Under the assumption of orthographic projection, the trajectories from background lie in a subspace spanned by three basis trajectories, i.e. the rank of the background matrix is 3. We estimate the compact background model based on this rank constraint. Furthermore, to enforce the spatial connectivity and motion coherency constraint, a Markov Random Field (MRF) is built for foreground estimation. This model is evaluated on a set of challenging sequences and shows superior performance compared to several state-of-the-art methods. Doctor of Philosophy (SCE) 2014-03-06T09:00:46Z 2014-03-06T09:00:46Z 2013 2013 Thesis http://hdl.handle.net/10356/55428 en 168 p. application/pdf

Exploring effective data representation for saliency detection in image and video

Similar Items