Feature pyramid transformer

Feature interactions across space and scales underpin modern visual recognition systems because they introduce beneficial visual contexts. Conventionally, spatial contexts are passively hidden in the CNN’s increasing receptive fields or actively encoded by non-local convolution. Yet, the non-local s...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHANG, Dong, ZHANG, Hanwang, TANG, Jinhui, WANG, Meng, HUA, Xian-Sheng, SUN, Qianru
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2020
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/5595
https://ink.library.smu.edu.sg/context/sis_research/article/6598/viewcontent/123730324.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
id sg-smu-ink.sis_research-6598
record_format dspace
spelling sg-smu-ink.sis_research-65982021-01-07T13:59:33Z Feature pyramid transformer ZHANG, Dong ZHANG, Hanwang TANG, Jinhui WANG, Meng HUA, Xian-Sheng SUN, Qianru Feature interactions across space and scales underpin modern visual recognition systems because they introduce beneficial visual contexts. Conventionally, spatial contexts are passively hidden in the CNN’s increasing receptive fields or actively encoded by non-local convolution. Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales. To this end, we propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT). It transforms any feature pyramid into another feature pyramid of the same size but with richer contexts, by using three specially designed transformers in self-level, top-down, and bottom-up interaction fashion. FPT serves as a generic visual backbone with fair computational overhead. We conduct extensive experiments in both instance-level ( i . e., object detection and instance segmentation) and pixel-level segmentation tasks, using various backbones and head networks, and observe consistent improvement over all the baselines and the state-of-the-art methods 2020-08-01T07:00:00Z text application/pdf https://ink.library.smu.edu.sg/sis_research/5595 info:doi/10.1007/978-3-030-58604-1_20 https://ink.library.smu.edu.sg/context/sis_research/article/6598/viewcontent/123730324.pdf http://creativecommons.org/licenses/by-nc-nd/4.0/ Research Collection School Of Computing and Information Systems eng Institutional Knowledge at Singapore Management University Feature pyramid Visual context Transformer Object detection Instance segmentation Semantic segmentation Artificial Intelligence and Robotics Databases and Information Systems
institution Singapore Management University
building SMU Libraries
continent Asia
country Singapore
Singapore
content_provider SMU Libraries
collection InK@SMU
language English
topic Feature pyramid
Visual context
Transformer
Object detection
Instance segmentation
Semantic segmentation
Artificial Intelligence and Robotics
Databases and Information Systems
spellingShingle Feature pyramid
Visual context
Transformer
Object detection
Instance segmentation
Semantic segmentation
Artificial Intelligence and Robotics
Databases and Information Systems
ZHANG, Dong
ZHANG, Hanwang
TANG, Jinhui
WANG, Meng
HUA, Xian-Sheng
SUN, Qianru
Feature pyramid transformer
description Feature interactions across space and scales underpin modern visual recognition systems because they introduce beneficial visual contexts. Conventionally, spatial contexts are passively hidden in the CNN’s increasing receptive fields or actively encoded by non-local convolution. Yet, the non-local spatial interactions are not across scales, and thus they fail to capture the non-local contexts of objects (or parts) residing in different scales. To this end, we propose a fully active feature interaction across both space and scales, called Feature Pyramid Transformer (FPT). It transforms any feature pyramid into another feature pyramid of the same size but with richer contexts, by using three specially designed transformers in self-level, top-down, and bottom-up interaction fashion. FPT serves as a generic visual backbone with fair computational overhead. We conduct extensive experiments in both instance-level ( i . e., object detection and instance segmentation) and pixel-level segmentation tasks, using various backbones and head networks, and observe consistent improvement over all the baselines and the state-of-the-art methods
format text
author ZHANG, Dong
ZHANG, Hanwang
TANG, Jinhui
WANG, Meng
HUA, Xian-Sheng
SUN, Qianru
author_facet ZHANG, Dong
ZHANG, Hanwang
TANG, Jinhui
WANG, Meng
HUA, Xian-Sheng
SUN, Qianru
author_sort ZHANG, Dong
title Feature pyramid transformer
title_short Feature pyramid transformer
title_full Feature pyramid transformer
title_fullStr Feature pyramid transformer
title_full_unstemmed Feature pyramid transformer
title_sort feature pyramid transformer
publisher Institutional Knowledge at Singapore Management University
publishDate 2020
url https://ink.library.smu.edu.sg/sis_research/5595
https://ink.library.smu.edu.sg/context/sis_research/article/6598/viewcontent/123730324.pdf
_version_ 1770575522288369664