Toward achieving robust low-level and high-level scene parsing
In this paper, we address the challenging task of scene segmentation. We first discuss and compare two widely used approaches to retain detailed spatial information from pretrained CNN - "dilation" and "skip". Then, we demonstrate that the parsing performance of "skip"...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/142866 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-142866 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1428662020-07-06T06:05:50Z Toward achieving robust low-level and high-level scene parsing Shuai, Bing Ding, Henghui Liu, Ting Wang, Gang Jiang, Xudong School of Electrical and Electronic Engineering Rapid-Rich Object Search Lab Engineering::Electrical and electronic engineering Scene Parsing Convolution Neural Network In this paper, we address the challenging task of scene segmentation. We first discuss and compare two widely used approaches to retain detailed spatial information from pretrained CNN - "dilation" and "skip". Then, we demonstrate that the parsing performance of "skip" network can be noticeably improved by modifying the parameterization of skip layers. Furthermore, we introduce a "dense skip" architecture to retain a rich set of low-level information from pre-trained CNN, which is essential to improve the low-level parsing performance. Meanwhile, we propose a convolutional context network (CCN) and place it on top of pre-trained CNNs, which is used to aggregate contexts for high-level feature maps so that robust high-level parsing can be achieved. We name our segmentation network enhanced fully convolutional network (EFCN) based on its significantly enhanced structure over FCN. Extensive experimental studies justify each contribution separately. Without bells and whistles, EFCN achieves state-of-the-arts on segmentation datasets of ADE20K, Pascal Context, SUN-RGBD and Pascal VOC 2012. NRF (Natl Research Foundation, S’pore) MOE (Min. of Education, S’pore) Accepted version 2020-07-06T06:05:50Z 2020-07-06T06:05:50Z 2018 Journal Article Shuai, B., Ding, H., Liu, T., Wang, G., & Jiang, X. (2019). Toward achieving robust low-level and high-level scene parsing. IEEE Transactions on Image Processing, 28(3), 1378-1390. doi:10.1109/TIP.2018.2878975 1057-7149 https://hdl.handle.net/10356/142866 10.1109/TIP.2018.2878975 30387733 2-s2.0-85055869185 3 28 1378 1390 en IEEE Transactions on Image Processing © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The published version is available at: https://doi.org/10.1109/TIP.2018.2878975 application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
country |
Singapore |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering Scene Parsing Convolution Neural Network |
spellingShingle |
Engineering::Electrical and electronic engineering Scene Parsing Convolution Neural Network Shuai, Bing Ding, Henghui Liu, Ting Wang, Gang Jiang, Xudong Toward achieving robust low-level and high-level scene parsing |
description |
In this paper, we address the challenging task of scene segmentation. We first discuss and compare two widely used approaches to retain detailed spatial information from pretrained CNN - "dilation" and "skip". Then, we demonstrate that the parsing performance of "skip" network can be noticeably improved by modifying the parameterization of skip layers. Furthermore, we introduce a "dense skip" architecture to retain a rich set of low-level information from pre-trained CNN, which is essential to improve the low-level parsing performance. Meanwhile, we propose a convolutional context network (CCN) and place it on top of pre-trained CNNs, which is used to aggregate contexts for high-level feature maps so that robust high-level parsing can be achieved. We name our segmentation network enhanced fully convolutional network (EFCN) based on its significantly enhanced structure over FCN. Extensive experimental studies justify each contribution separately. Without bells and whistles, EFCN achieves state-of-the-arts on segmentation datasets of ADE20K, Pascal Context, SUN-RGBD and Pascal VOC 2012. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Shuai, Bing Ding, Henghui Liu, Ting Wang, Gang Jiang, Xudong |
format |
Article |
author |
Shuai, Bing Ding, Henghui Liu, Ting Wang, Gang Jiang, Xudong |
author_sort |
Shuai, Bing |
title |
Toward achieving robust low-level and high-level scene parsing |
title_short |
Toward achieving robust low-level and high-level scene parsing |
title_full |
Toward achieving robust low-level and high-level scene parsing |
title_fullStr |
Toward achieving robust low-level and high-level scene parsing |
title_full_unstemmed |
Toward achieving robust low-level and high-level scene parsing |
title_sort |
toward achieving robust low-level and high-level scene parsing |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/142866 |
_version_ |
1681059639374905344 |