Visual search using artificial intelligence (deep learning models for image caption)

Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite diffic...

Full description

Saved in:

Bibliographic Details
Main Author:	Qiao, Guanheng
Other Authors:	Yap Kim Hui
Format:	Final Year Project
Language:	English
Published:	Nanyang Technological University 2020
Subjects:	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
Online Access:	https://hdl.handle.net/10356/140073
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-140073
record_format	dspace
spelling	sg-ntu-dr.10356-1400732023-07-07T18:42:50Z Visual search using artificial intelligence (deep learning models for image caption) Qiao, Guanheng Yap Kim Hui School of Electrical and Electronic Engineering ekhyap@ntu.edu.sg Engineering::Electrical and electronic engineering::Computer hardware, software and systems Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model. Bachelor of Engineering (Electrical and Electronic Engineering) 2020-05-26T06:36:24Z 2020-05-26T06:36:24Z 2020 Final Year Project (FYP) https://hdl.handle.net/10356/140073 en A3285-191 application/pdf Nanyang Technological University
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Engineering::Electrical and electronic engineering::Computer hardware, software and systems
spellingShingle	Engineering::Electrical and electronic engineering::Computer hardware, software and systems Qiao, Guanheng Visual search using artificial intelligence (deep learning models for image caption)
description	Currently, an increasing number of people are starting to use smart phone to take photos in their daily lives. Due to the convenience of smart phone, it’s quite common that someone has hundreds or thousands of photos in the photo gallery. With so many photos in the photo gallery, it’s a quite difficult task for user to find a specific photo. Therefore, the functionality to search a photo from gallery with text will be very helpful. This project aims to develop a deep learning model for image captioning and apply it into a web application. A detailed research of background and literature review was done to understand the state-of-art methods used in the field of image captioning. Several popular methods were researched to understand the development of image caption models. After a thorough research and comparison, The state-of-art method Neural Baby Talk was selected as the base of my project. The model was trained on both Flickr30k and MS COCO dataset. It was evaluated on a few commonly used metrics to verify the accuracy. A reinforcement training technique, Self-critical n-step Training, was also applied in the training process to increase the performance. After testing, it’s confirmed that reinforcement learning training technique could increase the performance of Neural Baby Talk model. This report introduces the experiment details, such as experiment setup, training process, experiment result, performance analysis. It also discusses how different datasets and different self-critical training techniques can affect the performance of trained model. What’s more, it also discusses about the limitation of current model and some future improvement on the deep learning model.
author2	Yap Kim Hui
author_facet	Yap Kim Hui Qiao, Guanheng
format	Final Year Project
author	Qiao, Guanheng
author_sort	Qiao, Guanheng
title	Visual search using artificial intelligence (deep learning models for image caption)
title_short	Visual search using artificial intelligence (deep learning models for image caption)
title_full	Visual search using artificial intelligence (deep learning models for image caption)
title_fullStr	Visual search using artificial intelligence (deep learning models for image caption)
title_full_unstemmed	Visual search using artificial intelligence (deep learning models for image caption)
title_sort	visual search using artificial intelligence (deep learning models for image caption)
publisher	Nanyang Technological University
publishDate	2020
url	https://hdl.handle.net/10356/140073
_version_	1772828587232067584

Visual search using artificial intelligence (deep learning models for image caption)

Similar Items