Learning generalized video memory for automatic video captioning

Recent video captioning methods have made great progress by deep learning approaches with convolutional neural networks (CNN) and recurrent neural networks (RNN). While there are techniques that use memory networks for sentence decoding, few work has leveraged on the memory component to learn and ge...

Full description

Saved in:
Bibliographic Details
Main Authors: CHANG, Poo-Hee, TAN, Ah-hwee
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
CNN
Online Access:https://ink.library.smu.edu.sg/sis_research/6076
https://ink.library.smu.edu.sg/context/sis_research/article/7079/viewcontent/Multi_disciplinary_Trends_in_Artificial_Intelligence.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Recent video captioning methods have made great progress by deep learning approaches with convolutional neural networks (CNN) and recurrent neural networks (RNN). While there are techniques that use memory networks for sentence decoding, few work has leveraged on the memory component to learn and generalize the temporal structure in video. In this paper, we propose a new method, namely Generalized Video Memory (GVM), utilizing a memory model for enhancing video description generation. Based on a class of self-organizing neural networks, GVM’s model is able to learn new video features incrementally. The learned generalized memory is further exploited to decode the associated sentences using RNN. We evaluate our method on the YouTube2Text data set using BLEU and METEOR scores as a standard benchmark. Our results are shown to be competitive against other state-of-the-art methods.