Heterogeneous embedding propagation for large-scale e-commerce user alignment

We study the important problem of user alignment in e-commerce: to predict whether two online user identities that access an e-commerce site from different devices belong to one real-world person. As input, we have a set of user activity logs from Taobao and some labeled user identity linkages. User...

Full description

Saved in:
Bibliographic Details
Main Authors: ZHENG, Vincent W., SHA, Mo, LI, Yuchen, YANG, Hongxia, FANG, Yuan, ZHANG, Zhenjie, TAN, Kian-Lee, CHANG, Kevin Chen-Chuan
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2018
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4231
https://ink.library.smu.edu.sg/context/sis_research/article/5234/viewcontent/hep.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:We study the important problem of user alignment in e-commerce: to predict whether two online user identities that access an e-commerce site from different devices belong to one real-world person. As input, we have a set of user activity logs from Taobao and some labeled user identity linkages. User activity logs can be modeled using a heterogeneous interaction graph (HIG), and subsequently the user alignment task can be formulated as a semi-supervised HIG embedding problem. HIG embedding is challenging for two reasons: its heterogeneous nature and the presence of edge features. To address the challenges, we propose a novel Heterogeneous Embedding Prop- agation (HEP) model. The core idea is to iteratively reconstruct a node’s embedding from its heterogeneous neighbors in a weighted manner, and meanwhile propagate its embedding updates from reconstruction loss and/or classification loss to its neighbors. We conduct extensive experiments on large-scale datasets from Taobao, demonstrating that HEP significantly outperforms state- of-the-art baselines often by more than 10% in F-scores.