Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval
The difficulty of describing a shoe item seeing on street with text for online shopping demands an image-based retrieval solution. We call this problem street-to-shop shoe retrieval, whose goal is to find exactly the same shoe in the online shop image (shop scenario), given a daily shoe image (stree...
Saved in:
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/150181 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-150181 |
---|---|
record_format |
dspace |
spelling |
sg-ntu-dr.10356-1501812021-06-04T04:22:22Z Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval Zhan, Huijing Shi, Boxin Duan, Ling-Yu Kot, Alex Chichung School of Electrical and Electronic Engineering Engineering::Electrical and electronic engineering Multi-task Shoe Retrieval The difficulty of describing a shoe item seeing on street with text for online shopping demands an image-based retrieval solution. We call this problem street-to-shop shoe retrieval, whose goal is to find exactly the same shoe in the online shop image (shop scenario), given a daily shoe image (street scenario) as the query. We propose an improved Multi-Task View-invariant Convolutional Neural Network (MTV-CNN+) to handle the large visual discrepancy for the same shoe in different scenarios. A novel definition of shoe style is defined according to the combinations of part-aware semantic shoe attributes and the corresponding style identification loss is developed. Furthermore, a new loss function is proposed to minimize the distances between images of the same shoe captured from different viewpoints. In order to efficiently train MTV-CNN+, we develop an attribute-based weighting scheme on the conventional triplet loss function to put more emphasis on the hard triplets; a three-stage process is incorporated to progressively select the hard negative examples and anchor images. To validate the proposed method, we build a multi-view shoe dataset with semantic attributes (MVShoe) from the daily life and online shopping websites, and investigate how different triplet loss functions affect the performance. Experimental results show the advantage of MTV-CNN+ over existing approaches. National Research Foundation (NRF) This research was carried out at the Rapid-Rich Object Search (ROSE) Lab at the Nanyang Technological University, Singapore, supported by the National Research Foundation, Prime Ministers Office, Singapore, under NRF-NSFC grant NRF2016NRF-NSFC001-098. This project was also supported in part by the National Natural Science Foundation of China under Grant 61661146005 as well as National Science Foundation of China under Grant No. 61872012 and No. 61876007. 2021-06-04T04:22:21Z 2021-06-04T04:22:21Z 2019 Journal Article Zhan, H., Shi, B., Duan, L. & Kot, A. C. (2019). Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval. Computer Vision and Image Understanding, 180, 23-33. https://dx.doi.org/10.1016/j.cviu.2019.01.001 1077-3142 https://hdl.handle.net/10356/150181 10.1016/j.cviu.2019.01.001 2-s2.0-85060327925 180 23 33 en NRF2016NRF-NSFC001-098 Computer Vision and Image Understanding © 2019 Elsevier Inc. All rights reserved. |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Engineering::Electrical and electronic engineering Multi-task Shoe Retrieval |
spellingShingle |
Engineering::Electrical and electronic engineering Multi-task Shoe Retrieval Zhan, Huijing Shi, Boxin Duan, Ling-Yu Kot, Alex Chichung Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval |
description |
The difficulty of describing a shoe item seeing on street with text for online shopping demands an image-based retrieval solution. We call this problem street-to-shop shoe retrieval, whose goal is to find exactly the same shoe in the online shop image (shop scenario), given a daily shoe image (street scenario) as the query. We propose an improved Multi-Task View-invariant Convolutional Neural Network (MTV-CNN+) to handle the large visual discrepancy for the same shoe in different scenarios. A novel definition of shoe style is defined according to the combinations of part-aware semantic shoe attributes and the corresponding style identification loss is developed. Furthermore, a new loss function is proposed to minimize the distances between images of the same shoe captured from different viewpoints. In order to efficiently train MTV-CNN+, we develop an attribute-based weighting scheme on the conventional triplet loss function to put more emphasis on the hard triplets; a three-stage process is incorporated to progressively select the hard negative examples and anchor images. To validate the proposed method, we build a multi-view shoe dataset with semantic attributes (MVShoe) from the daily life and online shopping websites, and investigate how different triplet loss functions affect the performance. Experimental results show the advantage of MTV-CNN+ over existing approaches. |
author2 |
School of Electrical and Electronic Engineering |
author_facet |
School of Electrical and Electronic Engineering Zhan, Huijing Shi, Boxin Duan, Ling-Yu Kot, Alex Chichung |
format |
Article |
author |
Zhan, Huijing Shi, Boxin Duan, Ling-Yu Kot, Alex Chichung |
author_sort |
Zhan, Huijing |
title |
Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval |
title_short |
Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval |
title_full |
Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval |
title_fullStr |
Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval |
title_full_unstemmed |
Deepshoe : an improved Multi-Task View-invariant CNN for street-to-shop shoe retrieval |
title_sort |
deepshoe : an improved multi-task view-invariant cnn for street-to-shop shoe retrieval |
publishDate |
2021 |
url |
https://hdl.handle.net/10356/150181 |
_version_ |
1702431209418653696 |