Multi-domain anime image generation and editing

Generative models such as text-to-image and image-to-image have been very successful to date. Some successful models include OpenAI's DALLE-2, Google's Imagen, and Parti. However, these state-of-the-art (SOTA) Diffusion models are hard to train, and finetuning them requires resources...

Full description

Saved in:

Bibliographic Details
Main Author:	Aravind S/O Sivakumaran
Other Authors:	Lu Shijian
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/162940
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-162940
record_format	dspace
spelling	sg-ntu-dr.10356-1629402022-11-14T06:17:56Z Multi-domain anime image generation and editing Aravind S/O Sivakumaran Lu Shijian School of Computer Science and Engineering Shijian.Lu@ntu.edu.sg Engineering::Computer science and engineering Generative models such as text-to-image and image-to-image have been very successful to date. Some successful models include OpenAI's DALLE-2, Google's Imagen, and Parti. However, these state-of-the-art (SOTA) Diffusion models are hard to train, and finetuning them requires resources many may not have, unlike GANs. GANs, unlike Diffusion Models, have a faster inference process and could better integrate into production workflows with tight deadlines. Therefore, we propose training GAN models using our end-to-end framework while extending existing GANs networks to multi-domains to enable integration into existing training workflows. We aim to introduce text-to-image multimodal generation for existing StyleGAN2 networks that can be used for editing while allowing extension to different style domains using StyleGAN-NADA. Additionally, as part of our model editing workflow, existing StyleGAN2 network outputs can be passed to a Diffusion Model such as Stable Diffusion for image-to-image translation for image editing purposes. Finally, we can explore slimming down the StyleGAN2 network for faster inference on edge devices, as StyleGAN2 is computationally intensive for edge devices to handle. Keywords: Anime, StyleGAN, Generative Adversarial Networks, Image-to-Image translation, Text-to-Image translation, Image Editing, Model Compression, Multi Domain, Diffusion Models, CLIP Bachelor of Engineering (Computer Science) 2022-11-14T06:17:56Z 2022-11-14T06:17:56Z 2022 Final Year Project (FYP) Aravind S/O Sivakumaran (2022). Multi-domain anime image generation and editing. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/162940 https://hdl.handle.net/10356/162940 en SCSE21-0664 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Computer science and engineering
spellingShingle	Engineering::Computer science and engineering Aravind S/O Sivakumaran Multi-domain anime image generation and editing
description	Generative models such as text-to-image and image-to-image have been very successful to date. Some successful models include OpenAI's DALLE-2, Google's Imagen, and Parti. However, these state-of-the-art (SOTA) Diffusion models are hard to train, and finetuning them requires resources many may not have, unlike GANs. GANs, unlike Diffusion Models, have a faster inference process and could better integrate into production workflows with tight deadlines. Therefore, we propose training GAN models using our end-to-end framework while extending existing GANs networks to multi-domains to enable integration into existing training workflows. We aim to introduce text-to-image multimodal generation for existing StyleGAN2 networks that can be used for editing while allowing extension to different style domains using StyleGAN-NADA. Additionally, as part of our model editing workflow, existing StyleGAN2 network outputs can be passed to a Diffusion Model such as Stable Diffusion for image-to-image translation for image editing purposes. Finally, we can explore slimming down the StyleGAN2 network for faster inference on edge devices, as StyleGAN2 is computationally intensive for edge devices to handle. Keywords: Anime, StyleGAN, Generative Adversarial Networks, Image-to-Image translation, Text-to-Image translation, Image Editing, Model Compression, Multi Domain, Diffusion Models, CLIP
author2	Lu Shijian
author_facet	Lu Shijian Aravind S/O Sivakumaran
format	Final Year Project
author	Aravind S/O Sivakumaran
author_sort	Aravind S/O Sivakumaran
title	Multi-domain anime image generation and editing
title_short	Multi-domain anime image generation and editing
title_full	Multi-domain anime image generation and editing
title_fullStr	Multi-domain anime image generation and editing
title_full_unstemmed	Multi-domain anime image generation and editing
title_sort	multi-domain anime image generation and editing
publisher	Nanyang Technological University
publishDate	2022
url	https://hdl.handle.net/10356/162940
_version_	1751548545368850432

Multi-domain anime image generation and editing

Similar Items