Visual recognition using artificial intelligence (visual storytelling using deep learning)

With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now...

Full description

Saved in:
Bibliographic Details
Main Author: Feng, Shihao
Other Authors: Yap Kim Hui
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/139372
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-139372
record_format dspace
spelling sg-ntu-dr.10356-1393722023-07-07T18:34:51Z Visual recognition using artificial intelligence (visual storytelling using deep learning) Feng, Shihao Yap Kim Hui School of Electrical and Electronic Engineering ekhyap@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now possible to generate such a story based on users’ input images. Therefore, the objective of this project is to explore and design a deep learning model for visual story telling task. To be more detailed, this project aims to develop a deep learning model that can generate a story with five sentences using five given photos. The first part of the project focus on comparing the latest techniques used for visual story telling and evaluating their performance. As such, the “Adversarial Reward Learning for Visual Storytelling” (AREL) was selected as the base model for further optimization. The second part of the project focus on optimizing the base model and improving the performance on Microsoft dataset VIST (Visual Storytelling Task). Optimization mainly focus on the model structure such as the change of decoder initialization. Results from different approaches are discussed. Lastly, a python application with graphical user interface was designed where users can choose the photos and get the generated story. The report contains the related techniques used in the model, the design of the model and experimental results. It concludes with discussion of the final results and future work. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-19T05:27:48Z 2020-05-19T05:27:48Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/139372 en A3284-191 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Electrical and electronic engineering
spellingShingle Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence
Engineering::Electrical and electronic engineering
Feng, Shihao
Visual recognition using artificial intelligence (visual storytelling using deep learning)
description With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now possible to generate such a story based on users’ input images. Therefore, the objective of this project is to explore and design a deep learning model for visual story telling task. To be more detailed, this project aims to develop a deep learning model that can generate a story with five sentences using five given photos. The first part of the project focus on comparing the latest techniques used for visual story telling and evaluating their performance. As such, the “Adversarial Reward Learning for Visual Storytelling” (AREL) was selected as the base model for further optimization. The second part of the project focus on optimizing the base model and improving the performance on Microsoft dataset VIST (Visual Storytelling Task). Optimization mainly focus on the model structure such as the change of decoder initialization. Results from different approaches are discussed. Lastly, a python application with graphical user interface was designed where users can choose the photos and get the generated story. The report contains the related techniques used in the model, the design of the model and experimental results. It concludes with discussion of the final results and future work.
author2 Yap Kim Hui
author_facet Yap Kim Hui
Feng, Shihao
format Final Year Project
author Feng, Shihao
author_sort Feng, Shihao
title Visual recognition using artificial intelligence (visual storytelling using deep learning)
title_short Visual recognition using artificial intelligence (visual storytelling using deep learning)
title_full Visual recognition using artificial intelligence (visual storytelling using deep learning)
title_fullStr Visual recognition using artificial intelligence (visual storytelling using deep learning)
title_full_unstemmed Visual recognition using artificial intelligence (visual storytelling using deep learning)
title_sort visual recognition using artificial intelligence (visual storytelling using deep learning)
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/139372
_version_ 1772825489823498240