SEMANTIC COMPOSITIONAL NETWORK WITH TOP-DOWN ATTENTION FOR PRODUCT TITLE GENERATION FROM IMAGE

E-commerce is currently one of the most widely used forms of transactions. There are many ecommerce applications nowadays. There are several types of e-commerce and one of them is customer-to-customer (C2C). C2C e-commerce is one type of e-commerce where the seller can be anyone, thus the product...

Full description

Saved in:
Bibliographic Details
Main Author: Wijaya, Nicholas
Format: Final Project
Language:Indonesia
Online Access:https://digilib.itb.ac.id/gdl/view/50032
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Institut Teknologi Bandung
Language: Indonesia
Description
Summary:E-commerce is currently one of the most widely used forms of transactions. There are many ecommerce applications nowadays. There are several types of e-commerce and one of them is customer-to-customer (C2C). C2C e-commerce is one type of e-commerce where the seller can be anyone, thus the product data input is done manually. Those manual processes could lead to problems, for example, inconsistency and mistype at the product title. Therefore, product title generation from image system will be very useful for sellers in C2C e-commerce. In this work, the usage of image captioning method for generating product title from the image is proposed by combining two recent works: Semantic Compositional Networks by Gan et al. (2017) and Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering by Anderson et al. (2018). Practically, this work uses Semantic Compositional Networks combined with top-down attention from Anderson et al.’s work. Applying this approach, the system of product title generation from its image can yield a pretty good product title. The combined architecture achieves ROUGE-L, BLEU-1, BLEU-2, BLEU-3, and BLEU-4 scores of 0.8313, 0.7911, 0.6784, 0.5240, and 0.4179 respectively. It outperforms two reference works to the same dataset with scores of 0.8183, 0.7463, 0.6445, 0.4859, and 0.3812 for Gan et al.’s and 0.7922, 0.7867, 0.6816, 0.5159, and 0.3989 for Anderson et al.’s without the bottom-up attention.