Temporal consistent video editing using diffusion models
In recent years, the field of generative AI has seen unprecedented interest worldwide. Beginning with text-to-text generation, this field has garnered much media attention. While text-to-image generation in the form of DALL-E and Stable Diffusion amongst many others have achieved remarkable results,...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/175740 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-175740 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1757402024-05-10T15:40:42Z Temporal consistent video editing using diffusion models Bai, Shun Yao Lin Guosheng School of Computer Science and Engineering gslin@ntu.edu.sg Computer and Information Science Temporal consistent Video editing Diffusion In recent years, the field of generative AI has seen unprecedented interest worldwide. Beginning with text-to-text generation, this field has garnered much media attention. While text-to-image generation in the form of DALL-E and Stable Diffusion amongst many others have achieved remarkable results, video generation still remains a challenge. Key challenges in this domain relate to the need for high-quality training data, need to generate a sheer number of frames for a video of meaningful length as well as the need to maintain temporal consistency across frames. This project aims to explore approaches to replicate the success of image generation models in the video domain, in particular relating to the problem of achieving temporal consistency. It extends the work Rerender-A-Video done by Yang et al. to include flexibility in the frames sampled in the generation phase. Beyond extending the codebase to accept custom selection of frames, the project offers two dynamic ways of automated frame selection: firstly by selecting the frame with the most common keypoints within individual bins, and secondly a dynamic programming approach. While the binning method did not surpass the original constant interval selection, dynamic programming was able to achieve limited success depending on input video properties. Hence, some proposed extensions for future work include alternative approaches in formulating the dynamic programming problem, which should be trivial to integrate given the work in this paper to adapt underlying Rerender steps. Bachelor's degree 2024-05-06T02:56:57Z 2024-05-06T02:56:57Z 2024 Final Year Project (FYP) Bai, S. Y. (2024). Temporal consistent video editing using diffusion models. Final Year Project (FYP), Nanyang Technological University, Singapore. https://hdl.handle.net/10356/175740 https://hdl.handle.net/10356/175740 en SCSE23-0333 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Temporal consistent Video editing Diffusion |
spellingShingle |
Computer and Information Science Temporal consistent Video editing Diffusion Bai, Shun Yao Temporal consistent video editing using diffusion models |
description |
In recent years, the field of generative AI has seen unprecedented interest worldwide. Beginning with text-to-text generation, this field has garnered much media attention. While text-to-image generation in the form of DALL-E and Stable Diffusion amongst many others have achieved remarkable results, video generation still remains a challenge.
Key challenges in this domain relate to the need for high-quality training data, need to generate a sheer number of frames for a video of meaningful length as well as the need to maintain temporal consistency across frames.
This project aims to explore approaches to replicate the success of image generation models in the video domain, in particular relating to the problem of achieving temporal consistency. It extends the work Rerender-A-Video done by Yang et al. to include flexibility in the frames sampled in the generation phase. Beyond extending the codebase to accept custom selection of frames, the project offers two dynamic ways of automated frame selection: firstly by selecting the frame with the most common keypoints within individual bins, and secondly a dynamic programming approach. While the binning method did not surpass the original constant interval selection, dynamic programming was able to achieve limited success depending on input video properties.
Hence, some proposed extensions for future work include alternative approaches in formulating the dynamic programming problem, which should be trivial to integrate given the work in this paper to adapt underlying Rerender steps. |
author2 |
Lin Guosheng |
author_facet |
Lin Guosheng Bai, Shun Yao |
format |
Final Year Project |
author |
Bai, Shun Yao |
author_sort |
Bai, Shun Yao |
title |
Temporal consistent video editing using diffusion models |
title_short |
Temporal consistent video editing using diffusion models |
title_full |
Temporal consistent video editing using diffusion models |
title_fullStr |
Temporal consistent video editing using diffusion models |
title_full_unstemmed |
Temporal consistent video editing using diffusion models |
title_sort |
temporal consistent video editing using diffusion models |
publisher |
Nanyang Technological University |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/175740 |
_version_ |
1800916187822948352 |