Video snapshot: Single image motion expansion via invertible motion embedding

Unlike images, finding the desired video content in a large pool of videos is not easy due to the time cost of loading and watching. Most video streaming and sharing services provide the video preview function for a better browsing experience. In this paper, we aim to generate a video preview from a...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHU, Qianshu, HAN, Chu, HAN, Guoqiang, WONG, Tien-Tsin, HE, Shengfeng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2021
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/7844
https://ink.library.smu.edu.sg/context/sis_research/article/8847/viewcontent/video.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Unlike images, finding the desired video content in a large pool of videos is not easy due to the time cost of loading and watching. Most video streaming and sharing services provide the video preview function for a better browsing experience. In this paper, we aim to generate a video preview from a single image. To this end, we propose two cascaded networks, the motion embedding network and the motion expansion network. The motion embedding network aims to embed the spatio-temporal information into an embedded image, called video snapshot. On the other end, the motion expansion network is proposed to invert the video back from the input video snapshot. To hold the invertibility of motion embedding and expansion during training, we design four tailor-made losses and a motion attention module to make the network focus on the temporal information. In order to enhance the viewing experience, our expansion network involves an interpolation module to produce a longer video preview with a smooth transition. Extensive experiments demonstrate that our method can successfully embed the spatio-temporal information of a video into one "live" image, which can be converted back to a video preview. Quantitative and qualitative evaluations are conducted on a large number of videos to prove the effectiveness of our proposed method. In particular, statistics of PSNR and SSIM on a large number of videos show the proposed method is general, and it can generate a high-quality video from a single image.