IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM

Image captioning with emotion is the process of generating meaningful word sequences to explain an image by adding a specific style to the sentence. Although there are several studies regarding the generating image captioning with emotions, research that uses Indonesian language still does not ex...

Full description

Saved in:
Bibliographic Details
Main Author: Rahman Ahaddienata, Dery
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/39912
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
id id-itb.:39912
spelling id-itb.:399122019-06-28T13:46:47ZIMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM Rahman Ahaddienata, Dery Indonesia Final Project encoder-decoder framework, neural image captioning, stylenet, attention mechanism INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/39912 Image captioning with emotion is the process of generating meaningful word sequences to explain an image by adding a specific style to the sentence. Although there are several studies regarding the generating image captioning with emotions, research that uses Indonesian language still does not exist. In this final project, an encoder-decoder framework is used with ResNet152 as an encoder and Long Short-Term Memory (LSTM) as its decoder as in the Neural Image Captioning (NIC) study. Other variants of LSTM, called factored LSTM as mentioned in StyleNet research are also used for the purpose of generating image captioning with emotions. Attention mechanisms were added to these two architectures to improve their evaluation metrics. Learning method uses transfer learning and multitask learning. There are two evaluation metrics used, automatic evaluation metrics using BLEU metrics and manual evaluation through surveys to assess the attractiveness level between the results of factual sentences and emotion sentences. There are 2 types of datasets used, datasets for factual sentences with about 8000 data and 3 datasets for emotion sentences for happy emotions, sad emotions, and angry emotions. Creating an emotion dataset is done by an annotator, with the amount of collected data for each emotion are ~1000 sentences. The experiment was conducted to achieve the highest BLEU score for each factual and emotion dataset. The best models produced for factual datasets and emotion datasets will be evaluated through surveys. All models are trained end-to-end. The best results are achieved by the NIC architecture with the attention mechanism for generating factual sentences with BLEU-4 0.22. Best architecture for the generating emotional sentences achieved by StyleNet architecture with attention mechanisms. The BLEU-4 results for happy, sad, and angry emotions were 0.08, 0.09 and 0.10 respectively. In addition, evaluations through surveys provide a high level of attractiveness for models that produce emotion sentences. It gains 1.875% for factual sentences, 83.75%, 92.5%, and 87.5% for happy, sad, and angry sentences. text
institution Institut Teknologi Bandung
building Institut Teknologi Bandung Library
continent Asia
country Indonesia
Indonesia
content_provider Institut Teknologi Bandung
collection Digital ITB
language Indonesia
description Image captioning with emotion is the process of generating meaningful word sequences to explain an image by adding a specific style to the sentence. Although there are several studies regarding the generating image captioning with emotions, research that uses Indonesian language still does not exist. In this final project, an encoder-decoder framework is used with ResNet152 as an encoder and Long Short-Term Memory (LSTM) as its decoder as in the Neural Image Captioning (NIC) study. Other variants of LSTM, called factored LSTM as mentioned in StyleNet research are also used for the purpose of generating image captioning with emotions. Attention mechanisms were added to these two architectures to improve their evaluation metrics. Learning method uses transfer learning and multitask learning. There are two evaluation metrics used, automatic evaluation metrics using BLEU metrics and manual evaluation through surveys to assess the attractiveness level between the results of factual sentences and emotion sentences. There are 2 types of datasets used, datasets for factual sentences with about 8000 data and 3 datasets for emotion sentences for happy emotions, sad emotions, and angry emotions. Creating an emotion dataset is done by an annotator, with the amount of collected data for each emotion are ~1000 sentences. The experiment was conducted to achieve the highest BLEU score for each factual and emotion dataset. The best models produced for factual datasets and emotion datasets will be evaluated through surveys. All models are trained end-to-end. The best results are achieved by the NIC architecture with the attention mechanism for generating factual sentences with BLEU-4 0.22. Best architecture for the generating emotional sentences achieved by StyleNet architecture with attention mechanisms. The BLEU-4 results for happy, sad, and angry emotions were 0.08, 0.09 and 0.10 respectively. In addition, evaluations through surveys provide a high level of attractiveness for models that produce emotion sentences. It gains 1.875% for factual sentences, 83.75%, 92.5%, and 87.5% for happy, sad, and angry sentences.
format Final Project
author Rahman Ahaddienata, Dery
spellingShingle Rahman Ahaddienata, Dery
IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
author_facet Rahman Ahaddienata, Dery
author_sort Rahman Ahaddienata, Dery
title IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_short IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_full IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_fullStr IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_full_unstemmed IMAGE CAPTIONING WITH EMOTION USING ENCODER-DECODER FRAMEWORK LSTM AND FACTORED LSTM
title_sort image captioning with emotion using encoder-decoder framework lstm and factored lstm
url https://digilib.itb.ac.id/gdl/view/39912
_version_ 1822269403433009152