INDONESIAN IMAGE CAPTIONING USING SEMANTIC COMPOSITIONAL NETWORKS
The generation of automatic image descriptions is one of the popular challenges that is growing rapidly in the field of computer vision and natural language processing. The absence of research of Indonesian image captioning encourages the study to implement Indonesian image captioning. The resear...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/40093 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | The generation of automatic image descriptions is one of the popular challenges that is growing
rapidly in the field of computer vision and natural language processing. The absence of research
of Indonesian image captioning encourages the study to implement Indonesian image
captioning. The research on making an image captioning model in the Indonesian language is
done using the Semantic Compositional Network (SCN) topology. In this final project, the
SCN model is modified by adding Attention Network to increase the quality of the description
of the resulting image.
The generation of image descriptions is carried out in two languages: English and Indonesian.
The English image captioning is intended to get the best performance model between pure SCN
and attention-based SCN. The English image captioning model is trained on the COCO and
Flickr30k datasets which models later evaluated using BLEU, CIDEr-D, METEOR, and
ROUGE evaluation metrics. Based on the English image captioning experiments conducted,
the attention-based SCN model is the model with the best performance compared to the pure
SCN model. The model then was chosen to generate image descriptions in Indonesian.
The construction of an Indonesian image captioning dataset is done by translating COCO and
Flickr8k datasets using Google Translate. In addition, there are also manual corrections of 3000
image descriptions. Indonesian image captioning model performance is reviewed using BLEU
and ROUGE evaluation metrics. Evaluation metrics show the results of BLEU-4 0.2403 for
the COCO dataset and 0.2276 for the Flickr8k dataset while the ROUGE_L in the COCO
dataset is 0.4689 and the Flickr8k dataset is 0.5361. |
---|