Visual recognition using artificial intelligence (visual storytelling using deep learning)

With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now...

Full description

Saved in:
Bibliographic Details
Main Author: Feng, Shihao
Other Authors: Yap Kim Hui
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139372
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now possible to generate such a story based on users’ input images. Therefore, the objective of this project is to explore and design a deep learning model for visual story telling task. To be more detailed, this project aims to develop a deep learning model that can generate a story with five sentences using five given photos. The first part of the project focus on comparing the latest techniques used for visual story telling and evaluating their performance. As such, the “Adversarial Reward Learning for Visual Storytelling” (AREL) was selected as the base model for further optimization. The second part of the project focus on optimizing the base model and improving the performance on Microsoft dataset VIST (Visual Storytelling Task). Optimization mainly focus on the model structure such as the change of decoder initialization. Results from different approaches are discussed. Lastly, a python application with graphical user interface was designed where users can choose the photos and get the generated story. The report contains the related techniques used in the model, the design of the model and experimental results. It concludes with discussion of the final results and future work.