SGDNet : an end-to-end saliency-guided deep neural network for no-reference image quality assessment

We propose an end-to-end saliency-guided deep neural network (SGDNet) for no-reference image quality assessment (NR-IQA). Our SGDNet is built on an end-to-end multi-task learning framework in which two sub-tasks including visual saliency prediction and image quality prediction are jointly optimized...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Sheng, Jiang, Qiuping, Lin, Weisi, Wang, Yongtao
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/144191
https://doi.org/10.21979/N9/H38R0Z
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:We propose an end-to-end saliency-guided deep neural network (SGDNet) for no-reference image quality assessment (NR-IQA). Our SGDNet is built on an end-to-end multi-task learning framework in which two sub-tasks including visual saliency prediction and image quality prediction are jointly optimized with a shared feature extractor. The existing multi-task CNN-based NR-IQA methods which usually consider distortion identification as the auxiliary sub-task cannot accurately identify the complex mixtures of distortions exist in authentically distorted images. By contrast, our saliency prediction sub-task is more universal because visual attention always exists when viewing every image, regardless of its distortion type. More importantly, related works have reported that saliency information is highly correlated with image quality while this property is fully utilized in our proposed SGNet by training the model with more informative labels including saliency maps and quality scores simultaneously. In addition, the outputs of the saliency prediction sub-task are transparent to the primary quality regression sub-task by providing a kind of spatial attention masks for a more perceptually-consistent feature fusion. By training the whole network with the two sub-tasks together, more discriminant features can be learned and a more accurate mapping from feature representations to quality scores can be established. Experimental results on both authentically and synthetically distorted IQA datasets demonstrate the superiority of our SGDNet, as compared to the state-of-the-art approaches.