From an image to a text description of the image

This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is h...

Full description

Saved in:

Bibliographic Details
Main Author:	Thian, Ronald Chuan Yan
Other Authors:	Chng Eng Siong
Format:	Final Year Project
Language:	English
Published:	2017
Subjects:	DRNTU::Engineering::Computer science and engineering
Online Access:	http://hdl.handle.net/10356/72777
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-72777
record_format	dspace
spelling	sg-ntu-dr.10356-727772023-03-03T20:25:18Z From an image to a text description of the image Thian, Ronald Chuan Yan Chng Eng Siong School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is heavily emphasised on exploring different types of image captioning models and their differences. Network used consists of a Convolutional Neural Network (CNN) that learns features on an image, and a Long Short-Term Memory (LSTM) unit that is used to predict the sequence of words from the learnt features in the CNN. This project does not implement live captioning of videos but pre-processes the video into frames and generates the appropriate captions for each frame, before the user is able to conduct the textual search. Bachelor of Engineering (Computer Science) 2017-11-13T13:00:42Z 2017-11-13T13:00:42Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/72777 en Nanyang Technological University 62 p. application/pdf
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	DRNTU::Engineering::Computer science and engineering
spellingShingle	DRNTU::Engineering::Computer science and engineering Thian, Ronald Chuan Yan From an image to a text description of the image
description	This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is heavily emphasised on exploring different types of image captioning models and their differences. Network used consists of a Convolutional Neural Network (CNN) that learns features on an image, and a Long Short-Term Memory (LSTM) unit that is used to predict the sequence of words from the learnt features in the CNN. This project does not implement live captioning of videos but pre-processes the video into frames and generates the appropriate captions for each frame, before the user is able to conduct the textual search.
author2	Chng Eng Siong
author_facet	Chng Eng Siong Thian, Ronald Chuan Yan
format	Final Year Project
author	Thian, Ronald Chuan Yan
author_sort	Thian, Ronald Chuan Yan
title	From an image to a text description of the image
title_short	From an image to a text description of the image
title_full	From an image to a text description of the image
title_fullStr	From an image to a text description of the image
title_full_unstemmed	From an image to a text description of the image
title_sort	from an image to a text description of the image
publishDate	2017
url	http://hdl.handle.net/10356/72777
_version_	1759853691752415232

From an image to a text description of the image

Similar Items