From an image to a text description of the image
This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is h...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/72777 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-72777 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-727772023-03-03T20:25:18Z From an image to a text description of the image Thian, Ronald Chuan Yan Chng Eng Siong School of Computer Science and Engineering DRNTU::Engineering::Computer science and engineering This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is heavily emphasised on exploring different types of image captioning models and their differences. Network used consists of a Convolutional Neural Network (CNN) that learns features on an image, and a Long Short-Term Memory (LSTM) unit that is used to predict the sequence of words from the learnt features in the CNN. This project does not implement live captioning of videos but pre-processes the video into frames and generates the appropriate captions for each frame, before the user is able to conduct the textual search. Bachelor of Engineering (Computer Science) 2017-11-13T13:00:42Z 2017-11-13T13:00:42Z 2017 Final Year Project (FYP) http://hdl.handle.net/10356/72777 en Nanyang Technological University 62 p. application/pdf |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
DRNTU::Engineering::Computer science and engineering |
spellingShingle |
DRNTU::Engineering::Computer science and engineering Thian, Ronald Chuan Yan From an image to a text description of the image |
description |
This project presents an implementation of a search function that allows users to search for a particular object of interest using only textual information. The main idea is to train a very deep neural network architecture that generates a useful description for the video frame. Also, the focus is heavily emphasised on exploring different types of image captioning models and their differences. Network used consists of a Convolutional Neural Network (CNN) that learns features on an image, and a Long Short-Term Memory (LSTM) unit that is used to predict the sequence of words from the learnt features in the CNN. This project does not implement live captioning of videos but pre-processes the video into frames and generates the appropriate captions for each frame, before the user is able to conduct the textual search. |
author2 |
Chng Eng Siong |
author_facet |
Chng Eng Siong Thian, Ronald Chuan Yan |
format |
Final Year Project |
author |
Thian, Ronald Chuan Yan |
author_sort |
Thian, Ronald Chuan Yan |
title |
From an image to a text description of the image |
title_short |
From an image to a text description of the image |
title_full |
From an image to a text description of the image |
title_fullStr |
From an image to a text description of the image |
title_full_unstemmed |
From an image to a text description of the image |
title_sort |
from an image to a text description of the image |
publishDate |
2017 |
url |
http://hdl.handle.net/10356/72777 |
_version_ |
1759853691752415232 |