From an image to a text description of an image
This project presents an implementation of a search feature that allows user to look for a particular object of interest in a video. The main idea is to train a very deep neural network architecture that outputs a sequence of words that describe an image. The network consists of a convolutional neur...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Final Year Project |
Language: | English |
Published: |
2017
|
Subjects: | |
Online Access: | http://hdl.handle.net/10356/70223 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | This project presents an implementation of a search feature that allows user to look for a particular object of interest in a video. The main idea is to train a very deep neural network architecture that outputs a sequence of words that describe an image. The network consists of a convolutional neural network (CNN) that learns features found on an image, and a long short-term memory (LSTM) unit that predicts the sequence of words from learnt features of the image. This project is not about real-time object detection, instead a video has to be preprocessed before a user may search for an object found visually inside the video. |
---|