Coherent visual story generation using diffusion models

Recent years, the advent of diffusion models has unlocked new possibilities in generative tasks, particularly in the realm of text-to-image generation. State-of-art models can create exquisite images that both satisfy users’ requirements and contain lots of details. In the last few years, some works...

Full description

Saved in:
Bibliographic Details
Main Author: Jiang, Jiaxi
Other Authors: Liu Ziwei
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175145
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-175145
record_format dspace
spelling sg-ntu-dr.10356-1751452024-04-26T15:43:13Z Coherent visual story generation using diffusion models Jiang, Jiaxi Liu Ziwei School of Computer Science and Engineering ziwei.liu@ntu.edu.sg Computer and Information Science Computer vision Diffusion model Recent years, the advent of diffusion models has unlocked new possibilities in generative tasks, particularly in the realm of text-to-image generation. State-of-art models can create exquisite images that both satisfy users’ requirements and contain lots of details. In the last few years, some works have explored the potential of diffusion models in story visualization and obtained great achievements. These methods design specific network structures and train on close-set images-text pairs to enforce the consistency of image sequences, effectively generating coherent characters and scenes. However, due to the close-set setting, the fine-tuned models can only generate stories within a specific domain of predefined characters. To generalize to another domain, massive training with a newly curated image-text story dataset is required. This greatly limits the potential of existing visual story generation approaches, restricting them from being used on real-world, open-set applications. In this research project, we aim to explore generalizable approaches for generating coherent visual story images from text descriptions using diffusion models. Unlike the previous line of visual story generation works on close-set datasets, this project focuses on the open-set scenario to mimic real-world challenges. Our key idea is to efficiently extract new visual concepts from only a small number of customized images, and then use the learned concept to generate story image sequences. In this way, the coherency of the story images sequence is ensured by containing consistent main characters. By incorporating state-of-the-art customization techniques on diffusion models, we effectively bridge the gap between visual and linguistic elements, generating coherent visual stories with diverse story text descriptions. To better support this task, we further contribute an OpenStory dataset for benchmark purpose. Qualitative and quantitative experiments demonstrate the effectiveness of the approach proposed in this project. Bachelor's degree 2024-04-22T05:05:01Z 2024-04-22T05:05:01Z 2024 Final Year Project (FYP) Jiang, J. (2024). Coherent visual story generation using diffusion models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175145 https://hdl.handle.net/10356/175145 en SCSE23-0241 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Computer vision
Diffusion model
spellingShingle Computer and Information Science
Computer vision
Diffusion model
Jiang, Jiaxi
Coherent visual story generation using diffusion models
description Recent years, the advent of diffusion models has unlocked new possibilities in generative tasks, particularly in the realm of text-to-image generation. State-of-art models can create exquisite images that both satisfy users’ requirements and contain lots of details. In the last few years, some works have explored the potential of diffusion models in story visualization and obtained great achievements. These methods design specific network structures and train on close-set images-text pairs to enforce the consistency of image sequences, effectively generating coherent characters and scenes. However, due to the close-set setting, the fine-tuned models can only generate stories within a specific domain of predefined characters. To generalize to another domain, massive training with a newly curated image-text story dataset is required. This greatly limits the potential of existing visual story generation approaches, restricting them from being used on real-world, open-set applications. In this research project, we aim to explore generalizable approaches for generating coherent visual story images from text descriptions using diffusion models. Unlike the previous line of visual story generation works on close-set datasets, this project focuses on the open-set scenario to mimic real-world challenges. Our key idea is to efficiently extract new visual concepts from only a small number of customized images, and then use the learned concept to generate story image sequences. In this way, the coherency of the story images sequence is ensured by containing consistent main characters. By incorporating state-of-the-art customization techniques on diffusion models, we effectively bridge the gap between visual and linguistic elements, generating coherent visual stories with diverse story text descriptions. To better support this task, we further contribute an OpenStory dataset for benchmark purpose. Qualitative and quantitative experiments demonstrate the effectiveness of the approach proposed in this project.
author2 Liu Ziwei
author_facet Liu Ziwei
Jiang, Jiaxi
format Final Year Project
author Jiang, Jiaxi
author_sort Jiang, Jiaxi
title Coherent visual story generation using diffusion models
title_short Coherent visual story generation using diffusion models
title_full Coherent visual story generation using diffusion models
title_fullStr Coherent visual story generation using diffusion models
title_full_unstemmed Coherent visual story generation using diffusion models
title_sort coherent visual story generation using diffusion models
publisher Nanyang Technological University
publishDate 2024
url https://hdl.handle.net/10356/175145
_version_ 1806059741161455616