Product image matching based on natural language processing

Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings...

Full description

Saved in:

Bibliographic Details
Main Author:	Wu, Tianxing
Other Authors:	Tan Yap Peng
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering
Online Access:	https://hdl.handle.net/10356/155842
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

Description
Summary:	Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings. With the rapid development of Computer Vision community in recent years, plenty of work has been made in related topics, such as image retrieval, image clustering, and image classification. However, image-based solutions could face severe problems in E-commerce environment, since images posted on online platforms usually lack certain key information about the attributes that can not be inferred through appearance. In addition, some fine-grained features of fashion products are also extremely difficult to extract from images. On the other hand, these attributes are usually included in product titles. As a result, developing an algorithm based on Natural Language Processing (NLP) to use text information to solve product matching problems has become a practical direction. Recently, large pre-trained language models like BERT have demonstrated powerful capabilities in solving a variety of NLP tasks, but since their training objective is not directly related to E-commerce, directly using them for our task may not lead to promising results. In view of the above problems, this project aims to find an appropriate way of adapting BERT-like models into E-commerce domain to solve the product matching problems. Specifically, three fine-tune schemas for the chosen pre-trained model are explored, and a two-stage text-based product matching pipeline is proposed. Furthermore, a novel loss function is proposed to assist the fine-tuning process. By conducting extensive experiments on a public dataset, the effectiveness of the proposed pipeline is verified, and the new loss function is proved to have superior text representation learning ability than other conventional methods examined for our specific task.

Product image matching based on natural language processing

Similar Items