Product image matching based on natural language processing

Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings...

Full description

Saved in:
Bibliographic Details
Main Author: Wu, Tianxing
Other Authors: Tan Yap Peng
Format: Thesis-Master by Coursework
Language:English
Published: Nanyang Technological University 2022
Subjects:
Online Access:https://hdl.handle.net/10356/155842
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-155842
record_format dspace
spelling sg-ntu-dr.10356-1558422023-07-04T17:43:27Z Product image matching based on natural language processing Wu, Tianxing Tan Yap Peng School of Electrical and Electronic Engineering Lazada EYPTan@ntu.edu.sg Engineering::Computer science and engineering Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings. With the rapid development of Computer Vision community in recent years, plenty of work has been made in related topics, such as image retrieval, image clustering, and image classification. However, image-based solutions could face severe problems in E-commerce environment, since images posted on online platforms usually lack certain key information about the attributes that can not be inferred through appearance. In addition, some fine-grained features of fashion products are also extremely difficult to extract from images. On the other hand, these attributes are usually included in product titles. As a result, developing an algorithm based on Natural Language Processing (NLP) to use text information to solve product matching problems has become a practical direction. Recently, large pre-trained language models like BERT have demonstrated powerful capabilities in solving a variety of NLP tasks, but since their training objective is not directly related to E-commerce, directly using them for our task may not lead to promising results. In view of the above problems, this project aims to find an appropriate way of adapting BERT-like models into E-commerce domain to solve the product matching problems. Specifically, three fine-tune schemas for the chosen pre-trained model are explored, and a two-stage text-based product matching pipeline is proposed. Furthermore, a novel loss function is proposed to assist the fine-tuning process. By conducting extensive experiments on a public dataset, the effectiveness of the proposed pipeline is verified, and the new loss function is proved to have superior text representation learning ability than other conventional methods examined for our specific task. Master of Science (Computer Control and Automation) 2022-03-23T23:36:35Z 2022-03-23T23:36:35Z 2022 Thesis-Master by Coursework Wu, T. (2022). Product image matching based on natural language processing. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/155842 https://hdl.handle.net/10356/155842 en ISM-DISS-02502 application/pdf Nanyang Technological University
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Engineering::Computer science and engineering
spellingShingle Engineering::Computer science and engineering
Wu, Tianxing
Product image matching based on natural language processing
description Nowadays, faced with an exploding number of retailers selling similar competitive products on the online platform, product matching has become an important topic in E-commerce. This task can be formed as a classic machine learning problem with retrieval, clustering, or binary classification settings. With the rapid development of Computer Vision community in recent years, plenty of work has been made in related topics, such as image retrieval, image clustering, and image classification. However, image-based solutions could face severe problems in E-commerce environment, since images posted on online platforms usually lack certain key information about the attributes that can not be inferred through appearance. In addition, some fine-grained features of fashion products are also extremely difficult to extract from images. On the other hand, these attributes are usually included in product titles. As a result, developing an algorithm based on Natural Language Processing (NLP) to use text information to solve product matching problems has become a practical direction. Recently, large pre-trained language models like BERT have demonstrated powerful capabilities in solving a variety of NLP tasks, but since their training objective is not directly related to E-commerce, directly using them for our task may not lead to promising results. In view of the above problems, this project aims to find an appropriate way of adapting BERT-like models into E-commerce domain to solve the product matching problems. Specifically, three fine-tune schemas for the chosen pre-trained model are explored, and a two-stage text-based product matching pipeline is proposed. Furthermore, a novel loss function is proposed to assist the fine-tuning process. By conducting extensive experiments on a public dataset, the effectiveness of the proposed pipeline is verified, and the new loss function is proved to have superior text representation learning ability than other conventional methods examined for our specific task.
author2 Tan Yap Peng
author_facet Tan Yap Peng
Wu, Tianxing
format Thesis-Master by Coursework
author Wu, Tianxing
author_sort Wu, Tianxing
title Product image matching based on natural language processing
title_short Product image matching based on natural language processing
title_full Product image matching based on natural language processing
title_fullStr Product image matching based on natural language processing
title_full_unstemmed Product image matching based on natural language processing
title_sort product image matching based on natural language processing
publisher Nanyang Technological University
publishDate 2022
url https://hdl.handle.net/10356/155842
_version_ 1772828058361790464