IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86168 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
id |
id-itb.:86168 |
---|---|
spelling |
id-itb.:861682024-09-15T05:33:20ZIMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN Khumaeni Indonesia Theses image captioning, pretrained model, transformer INSTITUT TEKNOLOGI BANDUNG https://digilib.itb.ac.id/gdl/view/86168 Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in this field has yet to consider sentiment factors, which are crucial for understanding the context and value of an image. This study develops an image captioning system with sentiment analysis in Indonesian, using a dataset that has been translated and enriched with sentiment information. This research introduces a new approach that leverages a model architecture with a pretrained image encoder as part of the encoding process to extract visual features from images. These features are then combined with vectors from the transformer encoder as text encoders. This combined input vector is then fed into a transformer decoder, which uses a Multihead Attention mechanism or Transformer, to generate text descriptions that match the sentiment present in the image. During the inference stage, the images undergo preprocessing and embedding to produce vector representations, differing from the training stage as the text vectors in the inference stage originate from the start token. The output from the decoder is then used as model input to iteratively predict the next word until the entire caption is formed. The evaluation is conducted using BLEU and ROUGE metrics and considers the accuracy in depicting the sentiment in the image. Experimental results show that the Inception-Transformer model outperforms other models, with the highest BLEU score of 0.366 and ROUGE score of 0.244 for positive sentiment, and a BLEU score of 0.323 and ROUGE score of 0.229 for negative sentiment. This research has the potential to be applied in various fields that require sentiment understanding in the context of images, such as in product reviews on e-commerce platforms. Further development can focus on improving accuracy, text description diversity, and more complex sentiment modeling in the Indonesian language. text |
institution |
Institut Teknologi Bandung |
building |
Institut Teknologi Bandung Library |
continent |
Asia |
country |
Indonesia Indonesia |
content_provider |
Institut Teknologi Bandung |
collection |
Digital ITB |
language |
Indonesia |
description |
Image Captioning is a branch of Natural Language Processing (NLP) and
Computer Vision that aims to generate accurate natural language descriptions of
images. More complex descriptions can enhance the user experience in identifying
images and understanding their context. However, most research in this field has
yet to consider sentiment factors, which are crucial for understanding the context
and value of an image.
This study develops an image captioning system with sentiment analysis in
Indonesian, using a dataset that has been translated and enriched with sentiment
information. This research introduces a new approach that leverages a model
architecture with a pretrained image encoder as part of the encoding process to
extract visual features from images. These features are then combined with vectors
from the transformer encoder as text encoders. This combined input vector is then
fed into a transformer decoder, which uses a Multihead Attention mechanism or
Transformer, to generate text descriptions that match the sentiment present in the
image.
During the inference stage, the images undergo preprocessing and embedding to
produce vector representations, differing from the training stage as the text vectors
in the inference stage originate from the start token. The output from the decoder
is then used as model input to iteratively predict the next word until the entire
caption is formed. The evaluation is conducted using BLEU and ROUGE metrics
and considers the accuracy in depicting the sentiment in the image.
Experimental results show that the Inception-Transformer model outperforms other
models, with the highest BLEU score of 0.366 and ROUGE score of 0.244 for
positive sentiment, and a BLEU score of 0.323 and ROUGE score of 0.229 for
negative sentiment. This research has the potential to be applied in various fields
that require sentiment understanding in the context of images, such as in product
reviews on e-commerce platforms. Further development can focus on improving
accuracy, text description diversity, and more complex sentiment modeling in the
Indonesian language. |
format |
Theses |
author |
Khumaeni |
spellingShingle |
Khumaeni IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN |
author_facet |
Khumaeni |
author_sort |
Khumaeni |
title |
IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN |
title_short |
IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN |
title_full |
IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN |
title_fullStr |
IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN |
title_full_unstemmed |
IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN |
title_sort |
image captioning with sentiment for indonesian |
url |
https://digilib.itb.ac.id/gdl/view/86168 |
_version_ |
1822010964399095808 |