Bridging global context interactions for high-fidelity image completion

Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we prop...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zheng, Chuanxia, Cham, Tat-Jen, Cai, Jianfei, Phung, Dinh
Other Authors:	School of Computer Science and Engineering
Format:	Conference or Workshop Item
Language:	English
Published:	2023
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes
Online Access:	https://hdl.handle.net/10356/172659
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-172659
record_format	dspace
spelling	sg-ntu-dr.10356-1726592023-12-19T05:00:56Z Bridging global context interactions for high-fidelity image completion Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh School of Computer Science and Engineering 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill. This research was supported by Monash FIT Grant. This study was also supported under the RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). 2023-12-19T05:00:56Z 2023-12-19T05:00:56Z 2022 Conference Paper Zheng, C., Cham, T., Cai, J. & Phung, D. (2022). Bridging global context interactions for high-fidelity image completion. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11502-11512. https://dx.doi.org/10.1109/CVPR52688.2022.01122 9781665469463 https://hdl.handle.net/10356/172659 10.1109/CVPR52688.2022.01122 2-s2.0-85136091993 11502 11512 en IAF-ICP © 2022 IEEE. All rights reserved.
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Radio Frequency Convolutional Codes Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh Bridging global context interactions for high-fidelity image completion
description	Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or large receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In this paper, we propose to treat image completion as a directionless sequence-to-sequence prediction task, and deploy a transformer to directly capture long-range depen-dence. Crucially, we employ a restrictive CNN with small and non-overlapping RF for weighted token representation, which allows the transformer to explicitly model the long-range visible context relations with equal importance in all layers, without implicitly confounding neighboring tokens when larger RFs are used. To improve appearance consistency between visible and generated regions, a novel attention-aware layer (AAL) is introduced to better exploit distantly related high-frequency features. Overall, extensive experiments demonstrate superior performance compared to state-of-the-art methods on several datasets. Code is available at https://github.com/lyndonzheng/TFill.
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh
format	Conference or Workshop Item
author	Zheng, Chuanxia Cham, Tat-Jen Cai, Jianfei Phung, Dinh
author_sort	Zheng, Chuanxia
title	Bridging global context interactions for high-fidelity image completion
title_short	Bridging global context interactions for high-fidelity image completion
title_full	Bridging global context interactions for high-fidelity image completion
title_fullStr	Bridging global context interactions for high-fidelity image completion
title_full_unstemmed	Bridging global context interactions for high-fidelity image completion
title_sort	bridging global context interactions for high-fidelity image completion
publishDate	2023
url	https://hdl.handle.net/10356/172659
_version_	1787136767253544960

Bridging global context interactions for high-fidelity image completion

Similar Items