IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN

Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in...

Full description

Saved in:
Bibliographic Details
Main Author: Khumaeni
Format: Theses
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/86168
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in this field has yet to consider sentiment factors, which are crucial for understanding the context and value of an image. This study develops an image captioning system with sentiment analysis in Indonesian, using a dataset that has been translated and enriched with sentiment information. This research introduces a new approach that leverages a model architecture with a pretrained image encoder as part of the encoding process to extract visual features from images. These features are then combined with vectors from the transformer encoder as text encoders. This combined input vector is then fed into a transformer decoder, which uses a Multihead Attention mechanism or Transformer, to generate text descriptions that match the sentiment present in the image. During the inference stage, the images undergo preprocessing and embedding to produce vector representations, differing from the training stage as the text vectors in the inference stage originate from the start token. The output from the decoder is then used as model input to iteratively predict the next word until the entire caption is formed. The evaluation is conducted using BLEU and ROUGE metrics and considers the accuracy in depicting the sentiment in the image. Experimental results show that the Inception-Transformer model outperforms other models, with the highest BLEU score of 0.366 and ROUGE score of 0.244 for positive sentiment, and a BLEU score of 0.323 and ROUGE score of 0.229 for negative sentiment. This research has the potential to be applied in various fields that require sentiment understanding in the context of images, such as in product reviews on e-commerce platforms. Further development can focus on improving accuracy, text description diversity, and more complex sentiment modeling in the Indonesian language.