Toward achieving robust low-level and high-level scene parsing

In this paper, we address the challenging task of scene segmentation. We first discuss and compare two widely used approaches to retain detailed spatial information from pretrained CNN - "dilation" and "skip". Then, we demonstrate that the parsing performance of "skip"...

Full description

Saved in:
Bibliographic Details
Main Authors: Shuai, Bing, Ding, Henghui, Liu, Ting, Wang, Gang, Jiang, Xudong
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/142866
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-142866
record_format dspace
spelling sg-ntu-dr.10356-1428662020-07-06T06:05:50Z Toward achieving robust low-level and high-level scene parsing Shuai, Bing Ding, Henghui Liu, Ting Wang, Gang Jiang, Xudong School of Electrical and Electronic Engineering Rapid-Rich Object Search Lab Engineering::Electrical and electronic engineering Scene Parsing Convolution Neural Network In this paper, we address the challenging task of scene segmentation. We first discuss and compare two widely used approaches to retain detailed spatial information from pretrained CNN - "dilation" and "skip". Then, we demonstrate that the parsing performance of "skip" network can be noticeably improved by modifying the parameterization of skip layers. Furthermore, we introduce a "dense skip" architecture to retain a rich set of low-level information from pre-trained CNN, which is essential to improve the low-level parsing performance. Meanwhile, we propose a convolutional context network (CCN) and place it on top of pre-trained CNNs, which is used to aggregate contexts for high-level feature maps so that robust high-level parsing can be achieved. We name our segmentation network enhanced fully convolutional network (EFCN) based on its significantly enhanced structure over FCN. Extensive experimental studies justify each contribution separately. Without bells and whistles, EFCN achieves state-of-the-arts on segmentation datasets of ADE20K, Pascal Context, SUN-RGBD and Pascal VOC 2012. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore) Accepted version 2020-07-06T06:05:50Z 2020-07-06T06:05:50Z 2018 Journal Article Shuai, B., Ding, H., Liu, T., Wang, G., & Jiang, X. (2019). Toward achieving robust low-level and high-level scene parsing. IEEE Transactions on Image Processing, 28(3), 1378-1390. doi:10.1109/TIP.2018.2878975 1057-7149 https://hdl.handle.net/10356/142866 10.1109/TIP.2018.2878975 30387733 2-s2.0-85055869185 3 28 1378 1390 en IEEE Transactions on Image Processing © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TIP.2018.2878975 application/pdf
institution Nanyang Technological University
building NTU Library
country Singapore
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering
Scene Parsing
Convolution Neural Network
spellingShingle Engineering::Electrical and electronic engineering
Scene Parsing
Convolution Neural Network
Shuai, Bing
Ding, Henghui
Liu, Ting
Wang, Gang
Jiang, Xudong
Toward achieving robust low-level and high-level scene parsing
description In this paper, we address the challenging task of scene segmentation. We first discuss and compare two widely used approaches to retain detailed spatial information from pretrained CNN - "dilation" and "skip". Then, we demonstrate that the parsing performance of "skip" network can be noticeably improved by modifying the parameterization of skip layers. Furthermore, we introduce a "dense skip" architecture to retain a rich set of low-level information from pre-trained CNN, which is essential to improve the low-level parsing performance. Meanwhile, we propose a convolutional context network (CCN) and place it on top of pre-trained CNNs, which is used to aggregate contexts for high-level feature maps so that robust high-level parsing can be achieved. We name our segmentation network enhanced fully convolutional network (EFCN) based on its significantly enhanced structure over FCN. Extensive experimental studies justify each contribution separately. Without bells and whistles, EFCN achieves state-of-the-arts on segmentation datasets of ADE20K, Pascal Context, SUN-RGBD and Pascal VOC 2012.
author2 School of Electrical and Electronic Engineering
author_facet School of Electrical and Electronic Engineering
Shuai, Bing
Ding, Henghui
Liu, Ting
Wang, Gang
Jiang, Xudong
format Article
author Shuai, Bing
Ding, Henghui
Liu, Ting
Wang, Gang
Jiang, Xudong
author_sort Shuai, Bing
title Toward achieving robust low-level and high-level scene parsing
title_short Toward achieving robust low-level and high-level scene parsing
title_full Toward achieving robust low-level and high-level scene parsing
title_fullStr Toward achieving robust low-level and high-level scene parsing
title_full_unstemmed Toward achieving robust low-level and high-level scene parsing
title_sort toward achieving robust low-level and high-level scene parsing
publishDate 2020
url https://hdl.handle.net/10356/142866
_version_ 1681059639374905344