INDONESIAN IMAGE CAPTIONING USING SEMANTIC COMPOSITIONAL NETWORKS

The generation of automatic image descriptions is one of the popular challenges that is growing rapidly in the field of computer vision and natural language processing. The absence of research of Indonesian image captioning encourages the study to implement Indonesian image captioning. The resear...

Full description

Saved in:
Bibliographic Details
Main Author: Andrew Obaja Sinurat, Ray
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/40093
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:The generation of automatic image descriptions is one of the popular challenges that is growing rapidly in the field of computer vision and natural language processing. The absence of research of Indonesian image captioning encourages the study to implement Indonesian image captioning. The research on making an image captioning model in the Indonesian language is done using the Semantic Compositional Network (SCN) topology. In this final project, the SCN model is modified by adding Attention Network to increase the quality of the description of the resulting image. The generation of image descriptions is carried out in two languages: English and Indonesian. The English image captioning is intended to get the best performance model between pure SCN and attention-based SCN. The English image captioning model is trained on the COCO and Flickr30k datasets which models later evaluated using BLEU, CIDEr-D, METEOR, and ROUGE evaluation metrics. Based on the English image captioning experiments conducted, the attention-based SCN model is the model with the best performance compared to the pure SCN model. The model then was chosen to generate image descriptions in Indonesian. The construction of an Indonesian image captioning dataset is done by translating COCO and Flickr8k datasets using Google Translate. In addition, there are also manual corrections of 3000 image descriptions. Indonesian image captioning model performance is reviewed using BLEU and ROUGE evaluation metrics. Evaluation metrics show the results of BLEU-4 0.2403 for the COCO dataset and 0.2276 for the Flickr8k dataset while the ROUGE_L in the COCO dataset is 0.4689 and the Flickr8k dataset is 0.5361.