Bridging global context interactions for high-fidelity image completion
Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we prop...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Conference or Workshop Item |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/172659 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-172659 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1726592023-12-19T05:00:56Z Bridging global context interactions for high-fidelity image completion Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh School of Computer Science and Engineering 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill. This research was supported by Monash FIT Grant. This study was also supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2023-12-19T05:00:56Z 2023-12-19T05:00:56Z 2022 Conference Paper Zheng, C., Cham, T., Cai, J. & Phung, D. (2022). Bridging global context interactions for high-fidelity image completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11502-11512. https://dx.doi.org/10.1109/CVPR52688.2022.01122 9781665469463 https://hdl.handle.net/10356/172659 10.1109/CVPR52688.2022.01122 2-s2.0-85136091993 11502 11512 en IAF-ICP © 2022 IEEE. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh Bridging global context interactions for high-fidelity image completion |
description |
Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh |
format |
Conference or Workshop Item |
author |
Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh |
author_sort |
Zheng, Chuanxia |
title |
Bridging global context interactions for high-fidelity image completion |
title_short |
Bridging global context interactions for high-fidelity image completion |
title_full |
Bridging global context interactions for high-fidelity image completion |
title_fullStr |
Bridging global context interactions for high-fidelity image completion |
title_full_unstemmed |
Bridging global context interactions for high-fidelity image completion |
title_sort |
bridging global context interactions for high-fidelity image completion |
publishDate |
2023 |
url |
https://hdl.handle.net/10356/172659 |
_version_ |
1787136767253544960 |