Learning generalized video memory for automatic video captioning

Recent video captioning methods have made great progress by deep learning approaches with convolutional neural networks (CNN) and recurrent neural networks (RNN). While there are techniques that use memory networks for sentence decoding, few work has leveraged on the memory component to learn and ge...

Full description

Saved in:

Bibliographic Details
Main Authors:	CHANG, Poo-Hee, TAN, Ah-hwee
Format:	text
Language:	English
Published:	Institutional Knowledge at Singapore Management University 2018
Subjects:	Memory model Video captioning Deep learning Adaptive Resonance Theory LSTM CNN Databases and Information Systems Software Engineering
Online Access:	https://ink.library.smu.edu.sg/sis_research/6076 https://ink.library.smu.edu.sg/context/sis_research/article/7079/viewcontent/Multi_disciplinary_Trends_in_Artificial_Intelligence.pdf
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Singapore Management University
Language:	English

Description
Summary:	Recent video captioning methods have made great progress by deep learning approaches with convolutional neural networks (CNN) and recurrent neural networks (RNN). While there are techniques that use memory networks for sentence decoding, few work has leveraged on the memory component to learn and generalize the temporal structure in video. In this paper, we propose a new method, namely Generalized Video Memory (GVM), utilizing a memory model for enhancing video description generation. Based on a class of self-organizing neural networks, GVM’s model is able to learn new video features incrementally. The learned generalized memory is further exploited to decode the associated sentences using RNN. We evaluate our method on the YouTube2Text data set using BLEU and METEOR scores as a standard benchmark. Our results are shown to be competitive against other state-of-the-art methods.

Learning generalized video memory for automatic video captioning

Similar Items