Learning invariant and uniformly distributed feature space for multi-view generation?

Multi-view generation from a given single view is a significant, yet challenging problem with broad applications in the field of virtual reality and robotics. Existing methods mainly utilize the basic GAN-based structure to help directly learn a mapping between two different views. Although they can...

Full description

Saved in:
Bibliographic Details
Main Authors: LU, Yuqin, CAO, Jiangzhong, HE, Shengfeng, GUO, Jiangtao, ZHOU, Qiliang, DAI, Qingyun
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2023
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7870
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Multi-view generation from a given single view is a significant, yet challenging problem with broad applications in the field of virtual reality and robotics. Existing methods mainly utilize the basic GAN-based structure to help directly learn a mapping between two different views. Although they can produce plausible results, they still struggle to recover faithful details and fail to generalize to unseen data. In this paper, we propose to learn invariant and uniformly distributed representations for multi-view generation with an "Alignment"and a "Uniformity"constraint (AU-GAN). Our method is inspired by the idea of contrastive learning to learn a well-regulated feature space for multi-view generation. Specifically, our feature extractor is supposed to extract view-invariant representation that captures intrinsic and essential knowledge of the input, and distribute all representations evenly throughout the space to enable the network to "explore"the entire feature space, thus avoiding poor generative ability on unseen data. Extensive experiments on multi-view generation for both faces and objects demonstrate the generative capability of our proposed method on generating realistic and high-quality views, especially for unseen data in wild conditions.