Product image matching based on natural language processing
Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings...
Saved in:
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis-Master by Coursework |
Language: | English |
Published: |
Nanyang Technological University
2022
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/155842 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings. With the rapid development of Computer Vision community in recent years, plenty of work has been made in related topics, such as image retrieval, image clustering, and image classification. However, image-based solutions could face severe problems in E-commerce environment, since images posted on online platforms usually lack certain key information about the attributes that can not be inferred through appearance. In addition, some fine-grained features of fashion products are also extremely difficult to extract from images. On the other hand, these attributes are usually included in product titles. As a result, developing an algorithm based on Natural Language Processing (NLP) to use text information to solve product matching problems has become a practical direction. Recently, large pre-trained language models like BERT have demonstrated powerful capabilities in solving a variety of NLP tasks, but since their training objective is not directly related to E-commerce, directly using them for our task may not lead to promising results. In view of the above problems, this project aims to find an appropriate way of adapting BERT-like models into E-commerce domain to solve the product matching problems. Specifically, three fine-tune schemas for the chosen pre-trained model are explored, and a two-stage text-based product matching pipeline is proposed. Furthermore, a novel loss function is proposed to assist the fine-tuning process. By conducting extensive experiments on a public dataset, the effectiveness of the proposed pipeline is verified, and the new loss function is proved to have superior text representation learning ability than other conventional methods examined for our specific task. |
---|