Classifier-based approaches for top-down salient object detection
Saliency estimation aims to identify visually important regions in an image and to inhibit distractors. It has been used in recent object detectors and image classifiers as a pre-processor to indicate possible object regions in an image. The category-independent object proposals produced by bo...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Theses and Dissertations |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/69601 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Saliency estimation aims to identify visually important regions in an image and to inhibit distractors. It has been used in recent object detectors and image classifiers as a pre-processor to indicate possible object regions in an image. The category-independent object proposals produced by bottom-up saliency approaches include those are irrelevant for tasks like object detection. The precision of the object proposals can be improved through top-down saliency approaches that produce category-specific saliency maps. Although, the prior knowledge about object categories learnt by classifiers are useful for top-down saliency estimation, the relationship between image classifiers and top-down salient object detectors has not been explored substantially. In this thesis we develop classifier-based approaches for top-down salient object detection in which first two are trained in a fully supervised setting and the last two are trained in a weakly supervised setting.
Non-linear feature representations such as sparse coding (SC) or locality constrained linear coding (LLC) cascaded with linear classifiers are proven to be effective in image classification. They are also used for top-down salient object detection to achieve a compact and discriminative representation of SIFT features, which helps to model feature selectivity for saliency map. We analyze the influence of these feature coding approaches in top-down salient object detection and also propose a novel coding strategy for top-down saliency estimation. The proposed coding strategy ensures that similar codes are assigned to the features which are adjacent in spatial, feature and category domains. These Locality constrained contextual sparse codes are max-pooled over a spatial neighborhood and a logistic regression classifier learnt on these max-pooled vectors is used for saliency estimation.
Many practical computer vision systems need to simultaneously identify the presence of an object as well as to segment it. Moreover, image classifiers and top-down salient object detection often share similar modules such as feature extractor, feature coding and feature classifier. This motivated us to develop our second fully supervised top-down saliency approach, which is a joint framework for saliency estimation and image classification. In this framework, the image classifier is used both to quantify the likelihood of the presence of an object and to update the saliency map using a novel saliency refinement method. A novel saliency-weighted max-pooling is proposed to improve image classification by weighting the max-pooled vector in each block of the spatial pyramid with a weight computed using top-down saliency maps.
Conventional top-down saliency approaches require fully supervised training in which exact object annotation is required. Availability of images from a simple tag-based internet search has made exact annotation for training saliency models unnecessary. This motivated us to develop weakly supervised top-down saliency approaches that are trained with image-level labels indicating the presence or absence of an object of interest. First, the probabilistic contribution of each patch in the image to the confidence score of a sparse coded spatial pyramid max-pooling (ScSPM) image classifier is analyzed to estimate its Reverse-ScSPM (R-ScSPM ) saliency. For high-level understanding of the surrounding spatial region, contextual information of the patch is required, which is incorporated using a contextual saliency module. Besides illustrating the accuracy of saliency maps produced by the proposed method, we demonstrate its effectiveness in applications like weakly supervised object annotation, class segmentation and action classification.
Finally, we develop a convolutional neural network (CNN) based, weakly supervised salient object detection approach that has both bottom-up and top-down modules. Here, we modify the backtracking strategy to identify salient regions that make positive contribution to a CNN-based image classifier. From a set of saliency maps of an image produced by fast bottom-up saliency approaches, we propose a novel strategy to select the best saliency map suitable for the top-down task. The selected bottom-up saliency map is combined with the top-down saliency map. Features having high combined saliency are used to train a linear SVM classifier to estimate contextual saliency. This is integrated with combined saliency and further refined through a multi-scale superpixel-averaging of saliency map. Experiments are carried out on seven challenging datasets and quantitative results are compared with 36 closely related approaches across 4 different applications. |
---|