HYDRA: Large-scale Social Identity Linkage via Heterogeneous Behavior Modeling

We study the problem of large-scale social identity linkage across different social media platforms, which is of critical importance to business intelligence by gaining from social data a deeper understanding and more accurate profiling of users. This paper proposes HYDRA, a solution framework which...

Full description

Saved in:
Bibliographic Details
Main Authors: Liu, Siyuan, Wang, Shuhui, ZHU, Feida, Zhang, Jinbo, Krishnan, Ramayya
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2014
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/2650
https://ink.library.smu.edu.sg/context/sis_research/article/3650/viewcontent/C105___HYDRA_Large_scale_Social_Identity_Linkage_via_Heterogeneous_Behavior_Modeling__SIGMOD2014_.pdf
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:We study the problem of large-scale social identity linkage across different social media platforms, which is of critical importance to business intelligence by gaining from social data a deeper understanding and more accurate profiling of users. This paper proposes HYDRA, a solution framework which consists of three key steps: (I) modeling heterogeneous behavior by long-term behavior distribution analysis and multi-resolution temporal information matching; (II) constructing structural consistency graph to measure the high-order structure consistency on users' core social structures across different platforms; and (III) learning the mapping function by multi-objective optimization composed of both the supervised learning on pair-wise ID linkage information and the cross-platform structure consistency maximization. Extensive experiments on 10 million users across seven popular social network platforms demonstrate that HYDRA correctly identifies real user linkage across different platforms, and outperforms existing state-of-the-art algorithms by at least 20% under different settings, and 4 times better in most settings.