Traffic-optimized data placement for social media

Social media users are generating data on an unprecedented scale. Distributed storage systems are often used to cope with explosive data growth. Data partitioning and replication are two interrelated data placement issues affecting the interserver traffic caused by user-initiated read and write oper...

Full description

Saved in:
Bibliographic Details
Main Authors: Tang, Jing, Tang, Xueyan, Yuan, Junsong
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2020
Subjects:
Online Access:https://hdl.handle.net/10356/140032
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Social media users are generating data on an unprecedented scale. Distributed storage systems are often used to cope with explosive data growth. Data partitioning and replication are two interrelated data placement issues affecting the interserver traffic caused by user-initiated read and write operations in distributed storage systems. This paper investigates how to minimize the interserver traffic among a cluster of social media servers through joint data partitioning and replication optimization. We formally define the problem and study its hardness. We then propose a traffic-optimized partitioning and replication (TOPR) method to continuously adapt data placement according to various dynamics. Evaluations with real Twitter and LiveJournal social graphs show that TOPR not only reduces the interserver traffic significantly but also saves much storage cost of replication compared to state-of-the-art methods. We also benchmark TOPR against the offline optimum by a binary linear program.