On handbag recognition and recommendation

From Google to Pinterest, multimedia search engines such as Google Goggles deliver a wealth of visual information related to the search query. It recognizes and provides useful information when pointing the mobile phone camera at a business card, a book, a painting, a famous landmark, or a barcode....

Full description

Saved in:
Bibliographic Details
Main Author: Wang, Yan
Other Authors: Kot Chichung, Alex
Format: Theses and Dissertations
Language:English
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/10356/69405
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:From Google to Pinterest, multimedia search engines such as Google Goggles deliver a wealth of visual information related to the search query. It recognizes and provides useful information when pointing the mobile phone camera at a business card, a book, a painting, a famous landmark, or a barcode. Vision-based techniques try to perceive and understand images by learning from the ability of human vision. Developing such techniques remains an ongoing challenge for computers. Nowadays, multimedia systems for online advertising and commerce have a large market demand. Recent years' computer vision and multimedia communities have devoted efforts on many applications, such as fashion retrieval or recommendation for clothing, shoes, etc. Handbag has become a desirable fashion accessory, with six in ten consumers having purchased at least one new handbag in the year of 2014. Such market demand motivates the handbag recognition related vision products. However, this kind of product is still limited so far. As Google says, Goggles does not work well yet on things like food, plants, animals and some fashion items such as handbags. To develop such reliable recognition engines, we study handbag recognition and recommendation, which are key steps for building up a multimedia search system. The works in this thesis can be summarized as below. A style-to-color discriminative representation framework for handbag recognition is carried out at first. We identify the handbag model by conducting the style-based recognition and color-based recognition sequentially due to the visual characteristics of handbags. Experiments are conducted on our newly constructed handbag datasets. The experimental results illustrate that our method achieves over 10% improvement in accuracy for recognizing handbags when compared with existing fine-grained or generic object recognition methods. In recent years, Convolutional Neural Network (CNN) is promising for many image recognition tasks, which motivates us to design a handbag recognition algorithm based on CNN. However, after studying various CNN architectures for training the classifier, we find that the previous CNN models do not provide discriminative color information during training. Moreover, CNN models usually consider the hard label (i.e., the ground truth class label) to train a multiclass classifier. This is not sufficient especially for visually similar classes. In order to train a better CNN for classification, we present a Feature Selective joint Classification-Regression CNN (FSCR-CNN) model. It is helpful for recognizing color sensitive objects and it facilitates the classifier modeling for visually similar classes. Moreover, we propose an end-to-end handbag recognition framework. In this framework, we propose three components: (1) symmetry-based proposal localization, (2) CNN detection and FSCR-CNN classification, and (3) combination of detection scores and classification scores by conditional probability model. The experimental results verify the advantages of each component of our framework for handbag recognition. A handbag recommendation system for e-commerce and shops is also proposed. It can help shoppers to find desirable fashion items, which facilitates online interaction and product promotion. Given the images of the shopper's preferred handbags, the recommendation is performed by joint learning of attribute projection and one-class SVM classification. A weighted AutoEncoder method is further proposed to refine the recommended results. The results show that this scheme performs favorably based on the initial subject testing.