Scene understanding based on heterogeneous data fusion

Solving visual translating problem has always been the major task of artificial intelligent. The problem has become advancing with the significant progress by static image understanding by deep neural network. (H. X. Subhashini Venugopalan 2015) When moving to dynamic scene such as video data, the i...

Full description

Saved in:
Bibliographic Details
Main Author: Ren, Haosu
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: 2018
Subjects:
Online Access:http://hdl.handle.net/10356/75215
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Solving visual translating problem has always been the major task of artificial intelligent. The problem has become advancing with the significant progress by static image understanding by deep neural network. (H. X. Subhashini Venugopalan 2015) When moving to dynamic scene such as video data, the information is enriched with not only static images but also temporal motions and acoustic signals. And an effective video scene understanding will help audition for today’s massive video updating activity. Therefore, how to extract and fuse these heterogeneous data became a new challenge to help machine understand the scene. In this project, we implemented the classical video caption network structure and discussed various approaches to fuse heterogeneous data aiming to generate a comprehensive sentence to describe a video. In the end, we compared different fusion methods on their decretive sentences to videos.