Visual saliency computation and quality evaluation via deep learning

Visual attention is an important mechanism in our human vision system, which filters out redundant and unimportant visual information for selectively processing the most salient or informative regions from the visual field. Visual saliency computation is about understanding and simulating the behavi...

Full description

Saved in:
Bibliographic Details
Main Author: Yang, Sheng
Other Authors: Lin Weisi
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2021
Subjects:
Online Access:https://hdl.handle.net/10356/145826
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-145826
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Yang, Sheng
Visual saliency computation and quality evaluation via deep learning
description Visual attention is an important mechanism in our human vision system, which filters out redundant and unimportant visual information for selectively processing the most salient or informative regions from the visual field. Visual saliency computation is about understanding and simulating the behavior of this selective attention mechanism in a visual scene. Computational models for visual saliency can provide clues to where people will look in images, what objects are salient in a scene, and how people will evaluate the perceptual quality of an image. Such models can be applied to advance a wide range of visual-oriented applications in image processing and computer vision areas. At present, recent advances in visual saliency computation are mainly led by the progress in deep learning techniques and many deep learning-based visual saliency approaches have emerged. In this thesis, we study the problems of deep learning-based visual saliency computation, including saliency prediction and salient object detection (SOD). Besides, saliency-guided image quality evaluation is also investigated to extend our work. For saliency prediction, existing deep saliency models suffer from either huge computation cost or limited performance gain. We propose an effective yet efficient saliency model, named Dilated Inception Network (DINet), to characterize the diverse and effective saliency-influential factors at different receptive field sizes with much smaller computation cost. Experimental results on the challenging saliency prediction datasets demonstrate the outstanding performance of our model in terms of both speed and accuracy. For SOD, the saliency maps produced by previous works still suffer from incomplete predictions due to the internal complexity of salient objects. To alleviate this problem, we propose a simple yet effective progressive self-guided loss function (PSG loss) to create progressive and auxiliary training supervisions for step-wisely guiding the training process. In our PSG loss, a simulated morphological closing operation is applied to the network predictions to generate the needed progressive supervisions epoch-wisely for characterizing the spatial dependencies of salient object pixels. As a result, SOD models can be guided by these generated supervisions to highlight more complete salient objects step-by-step for alleviating the problem of incomplete predictions. Experimental results on six widely used SOD benchmark datasets show that our loss function not only advances the performance of existing SOD models without architecture modification but also helps our proposed framework to achieve state-of-the-art performance. In the last work, we propose a novel saliency-guided deep neural network (SGDNet) to incorporate learnable saliency information into image quality evaluation. This model is the first attempt to jointly optimize the saliency prediction and quality evaluation sub-tasks in an end-to-end multi-task learning framework. The learned saliency information from the saliency prediction sub-task is transparent to the quality evaluation sub-task by providing a kind of spatial attention priors for the perceptually-consistent feature fusion. The effectiveness of the learned saliency information and the proposed multi-task framework are validated in the experiments.
author2 Lin Weisi
author_facet Lin Weisi
Yang, Sheng
format Thesis-Doctor of Philosophy
author Yang, Sheng
author_sort Yang, Sheng
title Visual saliency computation and quality evaluation via deep learning
title_short Visual saliency computation and quality evaluation via deep learning
title_full Visual saliency computation and quality evaluation via deep learning
title_fullStr Visual saliency computation and quality evaluation via deep learning
title_full_unstemmed Visual saliency computation and quality evaluation via deep learning
title_sort visual saliency computation and quality evaluation via deep learning
publisher Nanyang Technological University
publishDate 2021
url https://hdl.handle.net/10356/145826
_version_ 1695706142883184640
spelling sg-ntu-dr.10356-1458262021-03-02T08:40:56Z Visual saliency computation and quality evaluation via deep learning Yang, Sheng Lin Weisi School of Computer Science and Engineering WSLin@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Visual attention is an important mechanism in our human vision system, which filters out redundant and unimportant visual information for selectively processing the most salient or informative regions from the visual field. Visual saliency computation is about understanding and simulating the behavior of this selective attention mechanism in a visual scene. Computational models for visual saliency can provide clues to where people will look in images, what objects are salient in a scene, and how people will evaluate the perceptual quality of an image. Such models can be applied to advance a wide range of visual-oriented applications in image processing and computer vision areas. At present, recent advances in visual saliency computation are mainly led by the progress in deep learning techniques and many deep learning-based visual saliency approaches have emerged. In this thesis, we study the problems of deep learning-based visual saliency computation, including saliency prediction and salient object detection (SOD). Besides, saliency-guided image quality evaluation is also investigated to extend our work. For saliency prediction, existing deep saliency models suffer from either huge computation cost or limited performance gain. We propose an effective yet efficient saliency model, named Dilated Inception Network (DINet), to characterize the diverse and effective saliency-influential factors at different receptive field sizes with much smaller computation cost. Experimental results on the challenging saliency prediction datasets demonstrate the outstanding performance of our model in terms of both speed and accuracy. For SOD, the saliency maps produced by previous works still suffer from incomplete predictions due to the internal complexity of salient objects. To alleviate this problem, we propose a simple yet effective progressive self-guided loss function (PSG loss) to create progressive and auxiliary training supervisions for step-wisely guiding the training process. In our PSG loss, a simulated morphological closing operation is applied to the network predictions to generate the needed progressive supervisions epoch-wisely for characterizing the spatial dependencies of salient object pixels. As a result, SOD models can be guided by these generated supervisions to highlight more complete salient objects step-by-step for alleviating the problem of incomplete predictions. Experimental results on six widely used SOD benchmark datasets show that our loss function not only advances the performance of existing SOD models without architecture modification but also helps our proposed framework to achieve state-of-the-art performance. In the last work, we propose a novel saliency-guided deep neural network (SGDNet) to incorporate learnable saliency information into image quality evaluation. This model is the first attempt to jointly optimize the saliency prediction and quality evaluation sub-tasks in an end-to-end multi-task learning framework. The learned saliency information from the saliency prediction sub-task is transparent to the quality evaluation sub-task by providing a kind of spatial attention priors for the perceptually-consistent feature fusion. The effectiveness of the learned saliency information and the proposed multi-task framework are validated in the experiments. Doctor of Philosophy 2021-01-11T02:12:32Z 2021-01-11T02:12:32Z 2021 Thesis-Doctor of Philosophy Yang, S. (2021). Visual saliency computation and quality evaluation via deep learning. Doctoral thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/145826 10.32657/10356/145826 en This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). application/pdf Nanyang Technological University