Multi-domain anime image generation and editing
Generative models such as text-to-image and image-to-image have been very successful to date. Some successful models include OpenAI's DALLE-2, Google's Imagen, and Parti. However, these state-of-the-art (SOTA) Diffusion models are hard to train, and finetuning them requires resources...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/162940 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Generative models such as text-to-image and image-to-image have been very
successful to date. Some successful models include OpenAI's DALLE-2, Google's
Imagen, and Parti. However, these state-of-the-art (SOTA) Diffusion models are hard
to train, and finetuning them requires resources many may not have, unlike GANs.
GANs, unlike Diffusion Models, have a faster inference process and could better
integrate into production workflows with tight deadlines. Therefore, we propose
training GAN models using our end-to-end framework while extending existing GANs
networks to multi-domains to enable integration into existing training workflows. We
aim to introduce text-to-image multimodal generation for existing StyleGAN2
networks that can be used for editing while allowing extension to different style
domains using StyleGAN-NADA. Additionally, as part of our model editing
workflow, existing StyleGAN2 network outputs can be passed to a Diffusion Model
such as Stable Diffusion for image-to-image translation for image editing purposes.
Finally, we can explore slimming down the StyleGAN2 network for faster inference
on edge devices, as StyleGAN2 is computationally intensive for edge devices to
handle.
Keywords: Anime, StyleGAN, Generative Adversarial Networks, Image-to-Image
translation, Text-to-Image translation, Image Editing, Model Compression, Multi Domain, Diffusion Models, CLIP |
---|