IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN
Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in...
Saved in:
Main Author: | |
---|---|
Format: | Theses |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/86168 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | Image Captioning is a branch of Natural Language Processing (NLP) and
Computer Vision that aims to generate accurate natural language descriptions of
images. More complex descriptions can enhance the user experience in identifying
images and understanding their context. However, most research in this field has
yet to consider sentiment factors, which are crucial for understanding the context
and value of an image.
This study develops an image captioning system with sentiment analysis in
Indonesian, using a dataset that has been translated and enriched with sentiment
information. This research introduces a new approach that leverages a model
architecture with a pretrained image encoder as part of the encoding process to
extract visual features from images. These features are then combined with vectors
from the transformer encoder as text encoders. This combined input vector is then
fed into a transformer decoder, which uses a Multihead Attention mechanism or
Transformer, to generate text descriptions that match the sentiment present in the
image.
During the inference stage, the images undergo preprocessing and embedding to
produce vector representations, differing from the training stage as the text vectors
in the inference stage originate from the start token. The output from the decoder
is then used as model input to iteratively predict the next word until the entire
caption is formed. The evaluation is conducted using BLEU and ROUGE metrics
and considers the accuracy in depicting the sentiment in the image.
Experimental results show that the Inception-Transformer model outperforms other
models, with the highest BLEU score of 0.366 and ROUGE score of 0.244 for
positive sentiment, and a BLEU score of 0.323 and ROUGE score of 0.229 for
negative sentiment. This research has the potential to be applied in various fields
that require sentiment understanding in the context of images, such as in product
reviews on e-commerce platforms. Further development can focus on improving
accuracy, text description diversity, and more complex sentiment modeling in the
Indonesian language. |
---|