Precise region semantics-assisted GAN for pose-guided person image generation

Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu, Ji, Weng, Zhenyu, Zhu, Yuesheng
Other Authors: School of Electrical and Electronic Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/171904
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Generating a realistic person's image from one source pose conditioned on another different target pose is a promising computer vision task. The previous mainstream methods mainly focus on exploring the transformation relationship between the keypoint-based source pose and the target pose, but rarely investigate the region-based human semantic information. Some current methods that adopt the parsing map neither consider the precise local pose-semantic matching issues nor the correspondence between two different poses. In this study, a Region Semantics-Assisted Generative Adversarial Network (RSA-GAN) is proposed for the pose-guided person image generation task. In particular, a regional pose-guided semantic fusion module is first developed to solve the imprecise match issue between the semantic parsing map from a certain source image and the corresponding keypoints in the source pose. To well align the style of the human in the source image with the target pose, a pose correspondence guided style injection module is designed to learn the correspondence between the source pose and the target pose. In addition, one gated depth-wise convolutional cross-attention based style integration module is proposed to distribute the well-aligned coarse style information together with the precisely matched pose-guided semantic information towards the target pose. The experimental results indicate that the proposed RSA-GAN achieves a 23% reduction in LPIPS compared to the method without using the semantic maps and a 6.9% reduction in FID for the method with semantic maps, respectively, and also shows higher realistic qualitative results.