Bridging global context interactions for high-fidelity image completion

Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we prop...

Full description

Saved in:
Bibliographic Details
Main Authors: Zheng, Chuanxia, Cham, Tat-Jen, Cai, Jianfei, Phung, Dinh
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172659
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-172659
record_format dspace
spelling sg-ntu-dr.10356-1726592023-12-19T05:00:56Z Bridging global context interactions for high-fidelity image completion Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh School of Computer Science and Engineering 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill. This research was supported by Monash FIT Grant. This study was also supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2023-12-19T05:00:56Z 2023-12-19T05:00:56Z 2022 Conference Paper Zheng, C., Cham, T., Cai, J. & Phung, D. (2022). Bridging global context interactions for high-fidelity image completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11502-11512. https://dx.doi.org/10.1109/CVPR52688.2022.01122 9781665469463 https://hdl.handle.net/10356/172659 10.1109/CVPR52688.2022.01122 2-s2.0-85136091993 11502 11512 en IAF-ICP © 2022 IEEE. All rights reserved.
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Radio Frequency
Convolutional Codes
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Radio Frequency
Convolutional Codes
Zheng, Chuanxia
Cham, Tat-Jen
Cai, Jianfei
Phung, Dinh
Bridging global context interactions for high-fidelity image completion
description Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill.
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Zheng, Chuanxia
Cham, Tat-Jen
Cai, Jianfei
Phung, Dinh
format Conference or Workshop Item
author Zheng, Chuanxia
Cham, Tat-Jen
Cai, Jianfei
Phung, Dinh
author_sort Zheng, Chuanxia
title Bridging global context interactions for high-fidelity image completion
title_short Bridging global context interactions for high-fidelity image completion
title_full Bridging global context interactions for high-fidelity image completion
title_fullStr Bridging global context interactions for high-fidelity image completion
title_full_unstemmed Bridging global context interactions for high-fidelity image completion
title_sort bridging global context interactions for high-fidelity image completion
publishDate 2023
url https://hdl.handle.net/10356/172659
_version_ 1787136767253544960