Scene understanding based on visual and text data

In this project, the concept of scene understanding with visual and text data will be applied to the task of video captioning, and its effectiveness would be evaluated. A new dataset of videos and its accompanying captions, which aims to be an improvement over current datasets, will be collected for...

Full description

Saved in:
Bibliographic Details
Main Author: Ong, Randal Ren Tai
Other Authors: Mao Kezhi
Format: Final Year Project
Language:English
Published: 2019
Subjects:
Online Access:http://hdl.handle.net/10356/77701
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In this project, the concept of scene understanding with visual and text data will be applied to the task of video captioning, and its effectiveness would be evaluated. A new dataset of videos and its accompanying captions, which aims to be an improvement over current datasets, will be collected for training and analysis. After which, it will be put through a baseline for training and analysis. The output metrics and captions will be observed and recorded to gauge if the numbers correlate with human judgement, and whether would they be accurate or not. The possible reasons for the accuracy will also be analysed and perhaps proposed for future work as well.