Image and video generation via deep learning

Despite the immense success in image and video generation, several important problems still exist. This thesis aims at addressing the remaining challenges through advanced deep learning techniques. The first attempt is to construct a large-scale facial video dataset, DeeperForensics-1.0, to facilita...

Full description

Saved in:
Bibliographic Details
Main Author: Jiang, Liming
Other Authors: Chen Change Loy
Format: Thesis-Doctor of Philosophy
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/172067
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Despite the immense success in image and video generation, several important problems still exist. This thesis aims at addressing the remaining challenges through advanced deep learning techniques. The first attempt is to construct a large-scale facial video dataset, DeeperForensics-1.0, to facilitate the following research and prevent the negative impact of generated data via better video manipulation. After securing the countermeasures, a versatile Two-Stream Image-to-image Translation (TSIT) framework is proposed, which has high practical value. Besides, the thesis tackles the remaining issues through a more fundamental and theoretical study, focal frequency loss (FFL), a frequency-level loss function that is complementary to existing spatial losses. The thesis further introduces Adaptive Pseudo Augmentation (APA) for GAN training with limited data, reducing the data requirements. Extensive experiments and analyses showcase the effectiveness of the proposed methods in both perceptual quality and quantitative evaluations. Finally, the thesis envisions potential future work, offering more insights into this field.