IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN

Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in...

全面介紹

Saved in:

書目詳細資料
主要作者:	Khumaeni
格式:	Theses
語言:	Indonesia
在線閱讀:	https://digilib.itb.ac.id/gdl/view/86168
標簽:	添加標簽沒有標簽, 成為第一個標記此記錄!
機構:	Institut Teknologi Bandung
語言:	Indonesia

實物特徵
總結:	Image Captioning is a branch of Natural Language Processing (NLP) and Computer Vision that aims to generate accurate natural language descriptions of images. More complex descriptions can enhance the user experience in identifying images and understanding their context. However, most research in this field has yet to consider sentiment factors, which are crucial for understanding the context and value of an image. This study develops an image captioning system with sentiment analysis in Indonesian, using a dataset that has been translated and enriched with sentiment information. This research introduces a new approach that leverages a model architecture with a pretrained image encoder as part of the encoding process to extract visual features from images. These features are then combined with vectors from the transformer encoder as text encoders. This combined input vector is then fed into a transformer decoder, which uses a Multihead Attention mechanism or Transformer, to generate text descriptions that match the sentiment present in the image. During the inference stage, the images undergo preprocessing and embedding to produce vector representations, differing from the training stage as the text vectors in the inference stage originate from the start token. The output from the decoder is then used as model input to iteratively predict the next word until the entire caption is formed. The evaluation is conducted using BLEU and ROUGE metrics and considers the accuracy in depicting the sentiment in the image. Experimental results show that the Inception-Transformer model outperforms other models, with the highest BLEU score of 0.366 and ROUGE score of 0.244 for positive sentiment, and a BLEU score of 0.323 and ROUGE score of 0.229 for negative sentiment. This research has the potential to be applied in various fields that require sentiment understanding in the context of images, such as in product reviews on e-commerce platforms. Further development can focus on improving accuracy, text description diversity, and more complex sentiment modeling in the Indonesian language.

IMAGE CAPTIONING WITH SENTIMENT FOR INDONESIAN

相似書籍