A dilated inception network for visual saliency prediction

Recently, with the advent of deep convolutional neural networks (DCNN), the improvements in visual saliency prediction research are impressive. One possible direction to approach the next improvement is to fully characterize the multi-scale saliency-influential factors with a computationally-friendl...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Yang, Sheng, Lin, Guosheng, Jiang, Qiuping, Lin, Weisi
مؤلفون آخرون:	School of Computer Science and Engineering
التنسيق:	مقال
اللغة:	English
منشور في:	2020
الموضوعات:	Engineering::Computer science and engineering Visualization Computer Vision
الوصول للمادة أونلاين:	https://hdl.handle.net/10356/143872 https://doi.org/10.21979/N9/OIYLBK
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	sg-ntu-dr.10356-143872
record_format	dspace
spelling	sg-ntu-dr.10356-1438722021-01-18T04:50:20Z A dilated inception network for visual saliency prediction Yang, Sheng Lin, Guosheng Jiang, Qiuping Lin, Weisi School of Computer Science and Engineering Engineering::Computer science and engineering Visualization Computer Vision Recently, with the advent of deep convolutional neural networks (DCNN), the improvements in visual saliency prediction research are impressive. One possible direction to approach the next improvement is to fully characterize the multi-scale saliency-influential factors with a computationally-friendly module in DCNN architectures. In this work, we propose an end-to-end dilated inception network (DINet) for visual saliency prediction. It captures multi-scale contextual features effectively with very limited extra parameters. Instead of utilizing parallel standard convolutions with different kernel sizes as the existing inception module, our proposed dilated inception module (DIM) uses parallel dilated convolutions with different dilation rates which can significantly reduce the computation load while enriching the diversity of receptive fields in feature maps. Moreover, the performance of our saliency model is further improved by using a set of linear normalization-based probability distribution distance metrics as loss functions. As such, we can formulate saliency prediction as a global probability distribution prediction task for better saliency inference instead of a pixel-wise regression problem. Experimental results on several challenging saliency benchmark datasets demonstrate that our DINet with proposed loss functions can achieve state-of-the-art performance with shorter inference time. Accepted version 2020-09-29T02:09:10Z 2020-09-29T02:09:10Z 2020 Journal Article Yang, S., Lin, G., Jiang, Q., & Lin, W. (2020). A dilated inception network for visual saliency prediction. IEEE Transactions on Multimedia, 22(8), 2163-2176. doi:10.1109/TMM.2019.2947352 1520-9210 https://hdl.handle.net/10356/143872 10.1109/TMM.2019.2947352 8 22 2163 2176 en IEEE Transactions on Multimedia https://doi.org/10.21979/N9/OIYLBK © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TMM.2019.2947352 application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering Visualization Computer Vision
spellingShingle	Engineering::Computer science and engineering Visualization Computer Vision Yang, Sheng Lin, Guosheng Jiang, Qiuping Lin, Weisi A dilated inception network for visual saliency prediction
description	Recently, with the advent of deep convolutional neural networks (DCNN), the improvements in visual saliency prediction research are impressive. One possible direction to approach the next improvement is to fully characterize the multi-scale saliency-influential factors with a computationally-friendly module in DCNN architectures. In this work, we propose an end-to-end dilated inception network (DINet) for visual saliency prediction. It captures multi-scale contextual features effectively with very limited extra parameters. Instead of utilizing parallel standard convolutions with different kernel sizes as the existing inception module, our proposed dilated inception module (DIM) uses parallel dilated convolutions with different dilation rates which can significantly reduce the computation load while enriching the diversity of receptive fields in feature maps. Moreover, the performance of our saliency model is further improved by using a set of linear normalization-based probability distribution distance metrics as loss functions. As such, we can formulate saliency prediction as a global probability distribution prediction task for better saliency inference instead of a pixel-wise regression problem. Experimental results on several challenging saliency benchmark datasets demonstrate that our DINet with proposed loss functions can achieve state-of-the-art performance with shorter inference time.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Yang, Sheng Lin, Guosheng Jiang, Qiuping Lin, Weisi
format	Article
author	Yang, Sheng Lin, Guosheng Jiang, Qiuping Lin, Weisi
author_sort	Yang, Sheng
title	A dilated inception network for visual saliency prediction
title_short	A dilated inception network for visual saliency prediction
title_full	A dilated inception network for visual saliency prediction
title_fullStr	A dilated inception network for visual saliency prediction
title_full_unstemmed	A dilated inception network for visual saliency prediction
title_sort	dilated inception network for visual saliency prediction
publishDate	2020
url	https://hdl.handle.net/10356/143872 https://doi.org/10.21979/N9/OIYLBK
_version_	1690658404991762432

A dilated inception network for visual saliency prediction

مواد مشابهة