Visual recognition using artificial intelligence (visual storytelling using deep learning)
With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/139372 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-139372 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1393722023-07-07T18:34:51Z Visual recognition using artificial intelligence (visual storytelling using deep learning) Feng, Shihao Yap Kim Hui School of Electrical and Electronic Engineering ekhyap@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now possible to generate such a story based on users’ input images. Therefore, the objective of this project is to explore and design a deep learning model for visual story telling task. To be more detailed, this project aims to develop a deep learning model that can generate a story with five sentences using five given photos. The first part of the project focus on comparing the latest techniques used for visual story telling and evaluating their performance. As such, the “Adversarial Reward Learning for Visual Storytelling” (AREL) was selected as the base model for further optimization. The second part of the project focus on optimizing the base model and improving the performance on Microsoft dataset VIST (Visual Storytelling Task). Optimization mainly focus on the model structure such as the change of decoder initialization. Results from different approaches are discussed. Lastly, a python application with graphical user interface was designed where users can choose the photos and get the generated story. The report contains the related techniques used in the model, the design of the model and experimental results. It concludes with discussion of the final results and future work. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-19T05:27:48Z 2020-05-19T05:27:48Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/139372 en A3284-191 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering |
spellingShingle |
Engineering::Computer science and engineering::Computing methodologies::Artificial intelligence Engineering::Electrical and electronic engineering Feng, Shihao Visual recognition using artificial intelligence (visual storytelling using deep learning) |
description |
With popularity of smart phone users, people enjoy sharing their stories by posing photos on social media platform. Hence, it’s convenient if stories can be automatically written once users upload photos. Benefiting from huge improvement of deep learning techniques and computation power, it is now possible to generate such a story based on users’ input images. Therefore, the objective of this project is to explore and design a deep learning model for visual story telling task. To be more detailed, this project aims to develop a deep learning model that can generate a story with five sentences using five given photos. The first part of the project focus on comparing the latest techniques used for visual story telling and evaluating their performance. As such, the “Adversarial Reward Learning for Visual Storytelling” (AREL) was selected as the base model for further optimization. The second part of the project focus on optimizing the base model and improving the performance on Microsoft dataset VIST (Visual Storytelling Task). Optimization mainly focus on the model structure such as the change of decoder initialization. Results from different approaches are discussed. Lastly, a python application with graphical user interface was designed where users can choose the photos and get the generated story. The report contains the related techniques used in the model, the design of the model and experimental results. It concludes with discussion of the final results and future work. |
author2 |
Yap Kim Hui |
author_facet |
Yap Kim Hui Feng, Shihao |
format |
Final Year Project |
author |
Feng, Shihao |
author_sort |
Feng, Shihao |
title |
Visual recognition using artificial intelligence (visual storytelling using deep learning) |
title_short |
Visual recognition using artificial intelligence (visual storytelling using deep learning) |
title_full |
Visual recognition using artificial intelligence (visual storytelling using deep learning) |
title_fullStr |
Visual recognition using artificial intelligence (visual storytelling using deep learning) |
title_full_unstemmed |
Visual recognition using artificial intelligence (visual storytelling using deep learning) |
title_sort |
visual recognition using artificial intelligence (visual storytelling using deep learning) |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/139372 |
_version_ |
1772825489823498240 |