Video frame synthesis via plug-and-play deep locally temporal embedding
We propose a generative framework that tackles video frame interpolation. Conventionally, optical flow methods can solve the problem, but the perceptual quality depends on the accuracy of flow estimation. Nevertheless, a merit of traditional methods is that they have a remarkable generalization abil...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/145916 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-145916 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1459162021-01-14T05:35:55Z Video frame synthesis via plug-and-play deep locally temporal embedding Nguyen, Anh-Duc Kim, Woojae Kim, Jongyoo Lin, Weisi Lee, Sanghoon School of Computer Science and Engineering Engineering::Computer science and engineering Frame Synthesis Video Processing We propose a generative framework that tackles video frame interpolation. Conventionally, optical flow methods can solve the problem, but the perceptual quality depends on the accuracy of flow estimation. Nevertheless, a merit of traditional methods is that they have a remarkable generalization ability. Recently, deep convolutional neural networks (CNNs) have achieved good performance at the price of computation. However, to deploy a CNN, it is necessary to train it with a large-scale dataset beforehand, not to mention the process of fine tuning and adaptation afterwards. Also, despite the sharp motion results, their perceptual quality does not correlate well with their pixel-to-pixel difference metric performance due to various artifacts created by erroneous warping. In this paper, we take the advantages of both conventional and deep-learning models, and tackle the problem from a different perspective. The framework, which we call deep locally temporal embedding (DeepLTE), is powered by a deep CNN and can be used instantly like conventional models. DeepLTE fits an auto-encoding CNN to several consecutive frames and embeds some constraints on the latent representations so that new frames can be generated by interpolating new latent codes. Unlike the current deep learning paradigm which requires training on large datasets, DeepLTE works in a plug-and-play and unsupervised manner, and is able to generate an arbitrary number of frames from multiple given consecutive frames. We demonstrate that, without bells and whistles, DeepLTE outperforms existing state-of-the-art models in terms of the perceptual quality. Published version 2021-01-14T05:35:55Z 2021-01-14T05:35:55Z 2019 Journal Article Nguyen, A.-D., Kim, W., Kim, J., Lin, W., & Lee, S. (2019). Video frame synthesis via plug-and-play deep locally temporal embedding. IEEE Access, 7, 179304-179319. doi:10.1109/ACCESS.2019.2959019 2169-3536 0000-0001-9895-5347 0000-0002-8312-9736 0000-0002-2435-9195 0000-0001-9866-1947 #NODATA# https://hdl.handle.net/10356/145916 10.1109/ACCESS.2019.2959019 2-s2.0-85077230233 7 179304 179319 en IEEE Access © 2019 IEEE. This journal is 100% open access, which means that all content is freely available without charge to users or their institutions. All articles accepted after 12 June 2019 are published under a CC BY 4.0 license, and the author retains copyright. Users are allowed to read, download, copy, distribute, print, search, or link to the full texts of the articles, or use them for any other lawful purpose, as long as proper attribution is given. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering Frame Synthesis Video Processing |
spellingShingle |
Engineering::Computer science and engineering Frame Synthesis Video Processing Nguyen, Anh-Duc Kim, Woojae Kim, Jongyoo Lin, Weisi Lee, Sanghoon Video frame synthesis via plug-and-play deep locally temporal embedding |
description |
We propose a generative framework that tackles video frame interpolation. Conventionally, optical flow methods can solve the problem, but the perceptual quality depends on the accuracy of flow estimation. Nevertheless, a merit of traditional methods is that they have a remarkable generalization ability. Recently, deep convolutional neural networks (CNNs) have achieved good performance at the price of computation. However, to deploy a CNN, it is necessary to train it with a large-scale dataset beforehand, not to mention the process of fine tuning and adaptation afterwards. Also, despite the sharp motion results, their perceptual quality does not correlate well with their pixel-to-pixel difference metric performance due to various artifacts created by erroneous warping. In this paper, we take the advantages of both conventional and deep-learning models, and tackle the problem from a different perspective. The framework, which we call deep locally temporal embedding (DeepLTE), is powered by a deep CNN and can be used instantly like conventional models. DeepLTE fits an auto-encoding CNN to several consecutive frames and embeds some constraints on the latent representations so that new frames can be generated by interpolating new latent codes. Unlike the current deep learning paradigm which requires training on large datasets, DeepLTE works in a plug-and-play and unsupervised manner, and is able to generate an arbitrary number of frames from multiple given consecutive frames. We demonstrate that, without bells and whistles, DeepLTE outperforms existing state-of-the-art models in terms of the perceptual quality. |
author2 |
School of Computer Science and Engineering |
author_facet |
School of Computer Science and Engineering Nguyen, Anh-Duc Kim, Woojae Kim, Jongyoo Lin, Weisi Lee, Sanghoon |
format |
Article |
author |
Nguyen, Anh-Duc Kim, Woojae Kim, Jongyoo Lin, Weisi Lee, Sanghoon |
author_sort |
Nguyen, Anh-Duc |
title |
Video frame synthesis via plug-and-play deep locally temporal embedding |
title_short |
Video frame synthesis via plug-and-play deep locally temporal embedding |
title_full |
Video frame synthesis via plug-and-play deep locally temporal embedding |
title_fullStr |
Video frame synthesis via plug-and-play deep locally temporal embedding |
title_full_unstemmed |
Video frame synthesis via plug-and-play deep locally temporal embedding |
title_sort |
video frame synthesis via plug-and-play deep locally temporal embedding |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/145916 |
_version_ |
1690658279619821568 |