Learning to name faces: A multimodal learning scheme for search-based face annotation

Automated face annotation aims to automatically detect human faces from a photo and further name the faces with the corresponding human names. In this paper, we tackle this open problem by investigating a search-based face annotation (SBFA) paradigm for mining large amounts of web facial images free...

Full description

Saved in:
Bibliographic Details
Main Authors: WANG, Dayong, HOI, Steven C. H., WU, Pengcheng, ZHU, Jianke, HE, Ying, MIAO, Chunyan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2013
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2336
https://ink.library.smu.edu.sg/context/sis_research/article/3336/viewcontent/Learniing_to_Name_Faces_A_Multimodal_Learning_Scheme_for_Search_Based_Face_Annotation.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Automated face annotation aims to automatically detect human faces from a photo and further name the faces with the corresponding human names. In this paper, we tackle this open problem by investigating a search-based face annotation (SBFA) paradigm for mining large amounts of web facial images freely available on the WWW. Given a query facial image for annotation, the idea of SBFA is to first search for top-n similar facial images from a web facial image database and then exploit these top-ranked similar facial images and their weak labels for naming the query facial image. To fully mine those information, this paper proposes a novel framework of Learning to Name Faces (L2NF) – a unified multimodal learning approach for search-based face annotation, which consists of the following major components: (i) we enhance the weak labels of top-ranked similar images by exploiting the “label smoothness" assumption; (ii) we construct the multimodal representations of a facial image by extracting different types of features; (iii) we optimize the distance measure for each type of features using distance metric learning techniques; and finally (iv) we learn the optimal combination of multiple modalities for annotation through a learning to rank scheme. We conduct a set of extensive empirical studies on two real-world facial image databases, in which encouraging results show that the proposed algorithms significantly boost the naming accuracy of search-based face annotation task.