Visual search using artificial intelligence (deep learning models for image caption)

Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite diffic...

Full description

Saved in:
Bibliographic Details
Main Author: Qiao, Guanheng
Other Authors: Yap Kim Hui
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140073
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-140073
record_format dspace
spelling sg-ntu-dr.10356-1400732023-07-07T18:42:50Z Visual search using artificial intelligence (deep learning models for image caption) Qiao, Guanheng Yap Kim Hui School of Electrical and Electronic Engineering ekhyap@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-26T06:36:24Z 2020-05-26T06:36:24Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140073 en A3285-191 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Qiao, Guanheng
Visual search using artificial intelligence (deep learning models for image caption)
description Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model.
author2 Yap Kim Hui
author_facet Yap Kim Hui
Qiao, Guanheng
format Final Year Project
author Qiao, Guanheng
author_sort Qiao, Guanheng
title Visual search using artificial intelligence (deep learning models for image caption)
title_short Visual search using artificial intelligence (deep learning models for image caption)
title_full Visual search using artificial intelligence (deep learning models for image caption)
title_fullStr Visual search using artificial intelligence (deep learning models for image caption)
title_full_unstemmed Visual search using artificial intelligence (deep learning models for image caption)
title_sort visual search using artificial intelligence (deep learning models for image caption)
publisher Nanyang Technological University
publishDate 2020
url https://hdl.handle.net/10356/140073
_version_ 1772828587232067584