IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN

Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in...

Full description

Saved in:

Bibliographic Details
Main Author:	Khumaeni
Format:	Theses
Language:	Indonesia
Online Access:	https://digilib.itb.ac.id/gdl/view/86168
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Institut Teknologi Bandung
Language:	Indonesia

id	id-itb.:86168
spelling	id-itb.:861682024-09-15T05:33:20ZIMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN Khumaeni Indonesia Theses image captioning, pretrained model, transformer INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86168 Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in this field has yet to consider sentiment factors, which are crucial for understanding the context and value of an image. This study develops an image captioning system with sentiment analysis in Indonesian, using a dataset that has been translated and enriched with sentiment information. This research introduces a new approach that leverages a model architecture with a pretrained image encoder as part of the encoding process to extract visual features from images. These features are then combined with vectors from the transformer encoder as text encoders. This combined input vector is then fed into a transformer decoder, which uses a Multihead Attention mechanism or Transformer, to generate text descriptions that match the sentiment present in the image. During the inference stage, the images undergo preprocessing and embedding to produce vector representations, differing from the training stage as the text vectors in the inference stage originate from the start token. The output from the decoder is then used as model input to iteratively predict the next word until the entire caption is formed. The evaluation is conducted using BLEU and ROUGE metrics and considers the accuracy in depicting the sentiment in the image. Experimental results show that the Inception-Transformer model outperforms other models, with the highest BLEU score of 0.366 and ROUGE score of 0.244 for positive sentiment, and a BLEU score of 0.323 and ROUGE score of 0.229 for negative sentiment. This research has the potential to be applied in various fields that require sentiment understanding in the context of images, such as in product reviews on e-commerce platforms. Further development can focus on improving accuracy, text description diversity, and more complex sentiment modeling in the Indonesian language. text
institution	Institut Teknologi Bandung
building	Institut Teknologi Bandung Library
continent	Asia
country	Indonesia Indonesia
content_provider	Institut Teknologi Bandung
collection	Digital ITB
language	Indonesia
description	Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in this field has yet to consider sentiment factors, which are crucial for understanding the context and value of an image. This study develops an image captioning system with sentiment analysis in Indonesian, using a dataset that has been translated and enriched with sentiment information. This research introduces a new approach that leverages a model architecture with a pretrained image encoder as part of the encoding process to extract visual features from images. These features are then combined with vectors from the transformer encoder as text encoders. This combined input vector is then fed into a transformer decoder, which uses a Multihead Attention mechanism or Transformer, to generate text descriptions that match the sentiment present in the image. During the inference stage, the images undergo preprocessing and embedding to produce vector representations, differing from the training stage as the text vectors in the inference stage originate from the start token. The output from the decoder is then used as model input to iteratively predict the next word until the entire caption is formed. The evaluation is conducted using BLEU and ROUGE metrics and considers the accuracy in depicting the sentiment in the image. Experimental results show that the Inception-Transformer model outperforms other models, with the highest BLEU score of 0.366 and ROUGE score of 0.244 for positive sentiment, and a BLEU score of 0.323 and ROUGE score of 0.229 for negative sentiment. This research has the potential to be applied in various fields that require sentiment understanding in the context of images, such as in product reviews on e-commerce platforms. Further development can focus on improving accuracy, text description diversity, and more complex sentiment modeling in the Indonesian language.
format	Theses
author	Khumaeni
spellingShingle	Khumaeni IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
author_facet	Khumaeni
author_sort	Khumaeni
title	IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
title_short	IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
title_full	IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
title_fullStr	IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
title_full_unstemmed	IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
title_sort	image captioning with sentiment for indonesian
url	https://digilib.itb.ac.id/gdl/view/86168
_version_	1823657728702152704

IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN

Similar Items