SEMANTIC COMPOSITIONAL NETWORK WITH TOP-DOWN ATTENTION FOR PRODUCT TITLE GENERATION FROM IMAGE
E-commerce is currently one of the most widely used forms of transactions. There are many ecommerce applications nowadays. There are several types of e-commerce and one of them is customer-to-customer (C2C). C2C e-commerce is one type of e-commerce where the seller can be anyone, thus the product...
Saved in:
Main Author: | |
---|---|
Format: | Final Project |
Language: | Indonesia |
Online Access: | https://digilib.itb.ac.id/gdl/view/50032 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Institut Teknologi Bandung |
Language: | Indonesia |
Summary: | E-commerce is currently one of the most widely used forms of transactions. There are many ecommerce applications nowadays. There are several types of e-commerce and one of them is
customer-to-customer (C2C). C2C e-commerce is one type of e-commerce where the seller can be
anyone, thus the product data input is done manually. Those manual processes could lead to
problems, for example, inconsistency and mistype at the product title. Therefore, product title
generation from image system will be very useful for sellers in C2C e-commerce.
In this work, the usage of image captioning method for generating product title from the image is
proposed by combining two recent works: Semantic Compositional Networks by Gan et al. (2017)
and Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
by Anderson et al. (2018). Practically, this work uses Semantic Compositional Networks combined
with top-down attention from Anderson et al.’s work.
Applying this approach, the system of product title generation from its image can yield a pretty
good product title. The combined architecture achieves ROUGE-L, BLEU-1, BLEU-2, BLEU-3,
and BLEU-4 scores of 0.8313, 0.7911, 0.6784, 0.5240, and 0.4179 respectively. It outperforms
two reference works to the same dataset with scores of 0.8183, 0.7463, 0.6445, 0.4859, and 0.3812
for Gan et al.’s and 0.7922, 0.7867, 0.6816, 0.5159, and 0.3989 for Anderson et al.’s without the
bottom-up attention. |
---|