Visual search using artificial intelligence (deep learning models for image caption)
Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite diffic...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
Nanyang Technological University
2020
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/140073 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-140073 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1400732023-07-07T18:42:50Z Visual search using artificial intelligence (deep learning models for image caption) Qiao, Guanheng Yap Kim Hui School of Electrical and Electronic Engineering ekhyap@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-26T06:36:24Z 2020-05-26T06:36:24Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140073 en A3285-191 application/pdf Nanyang Technological University |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems |
spellingShingle |
Engineering::Electrical and electronic engineering::Computer hardware, software and systems Qiao, Guanheng Visual search using artificial intelligence (deep learning models for image caption) |
description |
Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model. |
author2 |
Yap Kim Hui |
author_facet |
Yap Kim Hui Qiao, Guanheng |
format |
Final Year Project |
author |
Qiao, Guanheng |
author_sort |
Qiao, Guanheng |
title |
Visual search using artificial intelligence (deep learning models for image caption) |
title_short |
Visual search using artificial intelligence (deep learning models for image caption) |
title_full |
Visual search using artificial intelligence (deep learning models for image caption) |
title_fullStr |
Visual search using artificial intelligence (deep learning models for image caption) |
title_full_unstemmed |
Visual search using artificial intelligence (deep learning models for image caption) |
title_sort |
visual search using artificial intelligence (deep learning models for image caption) |
publisher |
Nanyang Technological University |
publishDate |
2020 |
url |
https://hdl.handle.net/10356/140073 |
_version_ |
1772828587232067584 |