IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM

Image captioning with emotion is the process of generating meaningful word sequences to explain an image by adding a specific style to the sentence. Although there are several studies regarding the generating image captioning with emotions, research that uses Indonesian language still does not ex...

Full description

Saved in:

Bibliographic Details
Main Author:	Rahman Ahaddienata, Dery
Format:	Final Project
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/39912
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:39912
spelling	id-itb.:399122019-06-28T13:46:47ZIMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM Rahman Ahaddienata, Dery Indonesia Final Project encoder-decoder framework, neural image captioning, stylenet, attention mechanism INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39912 Image captioning with emotion is the process of generating meaningful word sequences to explain an image by adding a specific style to the sentence. Although there are several studies regarding the generating image captioning with emotions, research that uses Indonesian language still does not exist. In this final project, an encoder-decoder framework is used with ResNet152 as an encoder and Long Short-Term Memory (LSTM) as its decoder as in the Neural Image Captioning (NIC) study. Other variants of LSTM, called factored LSTM as mentioned in StyleNet research are also used for the purpose of generating image captioning with emotions. Attention mechanisms were added to these two architectures to improve their evaluation metrics. Learning method uses transfer learning and multitask learning. There are two evaluation metrics used, automatic evaluation metrics using BLEU metrics and manual evaluation through surveys to assess the attractiveness level between the results of factual sentences and emotion sentences. There are 2 types of datasets used, datasets for factual sentences with about 8000 data and 3 datasets for emotion sentences for happy emotions, sad emotions, and angry emotions. Creating an emotion dataset is done by an annotator, with the amount of collected data for each emotion are ~1000 sentences. The experiment was conducted to achieve the highest BLEU score for each factual and emotion dataset. The best models produced for factual datasets and emotion datasets will be evaluated through surveys. All models are trained end-to-end. The best results are achieved by the NIC architecture with the attention mechanism for generating factual sentences with BLEU-4 0.22. Best architecture for the generating emotional sentences achieved by StyleNet architecture with attention mechanisms. The BLEU-4 results for happy, sad, and angry emotions were 0.08, 0.09 and 0.10 respectively. In addition, evaluations through surveys provide a high level of attractiveness for models that produce emotion sentences. It gains 1.875% for factual sentences, 83.75%, 92.5%, and 87.5% for happy, sad, and angry sentences. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Image captioning with emotion is the process of generating meaningful word sequences to explain an image by adding a specific style to the sentence. Although there are several studies regarding the generating image captioning with emotions, research that uses Indonesian language still does not exist. In this final project, an encoder-decoder framework is used with ResNet152 as an encoder and Long Short-Term Memory (LSTM) as its decoder as in the Neural Image Captioning (NIC) study. Other variants of LSTM, called factored LSTM as mentioned in StyleNet research are also used for the purpose of generating image captioning with emotions. Attention mechanisms were added to these two architectures to improve their evaluation metrics. Learning method uses transfer learning and multitask learning. There are two evaluation metrics used, automatic evaluation metrics using BLEU metrics and manual evaluation through surveys to assess the attractiveness level between the results of factual sentences and emotion sentences. There are 2 types of datasets used, datasets for factual sentences with about 8000 data and 3 datasets for emotion sentences for happy emotions, sad emotions, and angry emotions. Creating an emotion dataset is done by an annotator, with the amount of collected data for each emotion are ~1000 sentences. The experiment was conducted to achieve the highest BLEU score for each factual and emotion dataset. The best models produced for factual datasets and emotion datasets will be evaluated through surveys. All models are trained end-to-end. The best results are achieved by the NIC architecture with the attention mechanism for generating factual sentences with BLEU-4 0.22. Best architecture for the generating emotional sentences achieved by StyleNet architecture with attention mechanisms. The BLEU-4 results for happy, sad, and angry emotions were 0.08, 0.09 and 0.10 respectively. In addition, evaluations through surveys provide a high level of attractiveness for models that produce emotion sentences. It gains 1.875% for factual sentences, 83.75%, 92.5%, and 87.5% for happy, sad, and angry sentences.
format	Final Project
author	Rahman Ahaddienata, Dery
spellingShingle	Rahman Ahaddienata, Dery IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
author_facet	Rahman Ahaddienata, Dery
author_sort	Rahman Ahaddienata, Dery
title	IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_short	IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_full	IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_fullStr	IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_full_unstemmed	IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_sort	image captioning with emotion using encoder-decoder framework lstm and factored lstm
url	https://digilib.itb.ac.id/gdl/view/39912
_version_	1822269403433009152

IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM

Similar Items