Coherent visual story generation using diffusion models

Recent years, the advent of diffusion models has unlocked new possibilities in generative tasks, particularly in the realm of text-to-image generation. State-of-art models can create exquisite images that both satisfy users’ requirements and contain lots of details. In the last few years, some works...

Full description

Saved in:

Bibliographic Details
Main Author:	Jiang, Jiaxi
Other Authors:	Liu Ziwei
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2024
Subjects:	Computer and Information Science Computer vision Diffusion model
Online Access:	https://hdl.handle.net/10356/175145
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-175145
record_format	dspace
spelling	sg-ntu-dr.10356-1751452024-04-26T15:43:13Z Coherent visual story generation using diffusion models Jiang, Jiaxi Liu Ziwei School of Computer Science and Engineering ziwei.liu@ntu.edu.sg Computer and Information Science Computer vision Diffusion model Recent years, the advent of diffusion models has unlocked new possibilities in generative tasks, particularly in the realm of text-to-image generation. State-of-art models can create exquisite images that both satisfy users’ requirements and contain lots of details. In the last few years, some works have explored the potential of diffusion models in story visualization and obtained great achievements. These methods design specific network structures and train on close-set images-text pairs to enforce the consistency of image sequences, effectively generating coherent characters and scenes. However, due to the close-set setting, the fine-tuned models can only generate stories within a specific domain of predefined characters. To generalize to another domain, massive training with a newly curated image-text story dataset is required. This greatly limits the potential of existing visual story generation approaches, restricting them from being used on real-world, open-set applications. In this research project, we aim to explore generalizable approaches for generating coherent visual story images from text descriptions using diffusion models. Unlike the previous line of visual story generation works on close-set datasets, this project focuses on the open-set scenario to mimic real-world challenges. Our key idea is to efficiently extract new visual concepts from only a small number of customized images, and then use the learned concept to generate story image sequences. In this way, the coherency of the story images sequence is ensured by containing consistent main characters. By incorporating state-of-the-art customization techniques on diffusion models, we effectively bridge the gap between visual and linguistic elements, generating coherent visual stories with diverse story text descriptions. To better support this task, we further contribute an OpenStory dataset for benchmark purpose. Qualitative and quantitative experiments demonstrate the effectiveness of the approach proposed in this project. Bachelor's degree 2024-04-22T05:05:01Z 2024-04-22T05:05:01Z 2024 Final Year Project (FYP) Jiang, J. (2024). Coherent visual story generation using diffusion models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175145 https://hdl.handle.net/10356/175145 en SCSE23-0241 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Computer vision Diffusion model
spellingShingle	Computer and Information Science Computer vision Diffusion model Jiang, Jiaxi Coherent visual story generation using diffusion models
description	Recent years, the advent of diffusion models has unlocked new possibilities in generative tasks, particularly in the realm of text-to-image generation. State-of-art models can create exquisite images that both satisfy users’ requirements and contain lots of details. In the last few years, some works have explored the potential of diffusion models in story visualization and obtained great achievements. These methods design specific network structures and train on close-set images-text pairs to enforce the consistency of image sequences, effectively generating coherent characters and scenes. However, due to the close-set setting, the fine-tuned models can only generate stories within a specific domain of predefined characters. To generalize to another domain, massive training with a newly curated image-text story dataset is required. This greatly limits the potential of existing visual story generation approaches, restricting them from being used on real-world, open-set applications. In this research project, we aim to explore generalizable approaches for generating coherent visual story images from text descriptions using diffusion models. Unlike the previous line of visual story generation works on close-set datasets, this project focuses on the open-set scenario to mimic real-world challenges. Our key idea is to efficiently extract new visual concepts from only a small number of customized images, and then use the learned concept to generate story image sequences. In this way, the coherency of the story images sequence is ensured by containing consistent main characters. By incorporating state-of-the-art customization techniques on diffusion models, we effectively bridge the gap between visual and linguistic elements, generating coherent visual stories with diverse story text descriptions. To better support this task, we further contribute an OpenStory dataset for benchmark purpose. Qualitative and quantitative experiments demonstrate the effectiveness of the approach proposed in this project.
author2	Liu Ziwei
author_facet	Liu Ziwei Jiang, Jiaxi
format	Final Year Project
author	Jiang, Jiaxi
author_sort	Jiang, Jiaxi
title	Coherent visual story generation using diffusion models
title_short	Coherent visual story generation using diffusion models
title_full	Coherent visual story generation using diffusion models
title_fullStr	Coherent visual story generation using diffusion models
title_full_unstemmed	Coherent visual story generation using diffusion models
title_sort	coherent visual story generation using diffusion models
publisher	Nanyang Technological University
publishDate	2024
url	https://hdl.handle.net/10356/175145
_version_	1806059741161455616

Coherent visual story generation using diffusion models

Similar Items