Differentially private distributed frequency estimation

In order to remain competitive, Internet companies collect and analyse user data for the purpose of the improvement of user experiences. Frequency estimation is a widely used statistical tool, which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods...

Full description

Saved in:
Bibliographic Details
Main Authors: Yang, Mengmeng, Tjuawinata, Ivan, Lam, Kwok-Yan, Zhu, Tianqing, Zhao, Jun
Other Authors: School of Computer Science and Engineering
Format: Article
Language:English
Published: 2023
Subjects:
Online Access:https://hdl.handle.net/10356/168026
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:In order to remain competitive, Internet companies collect and analyse user data for the purpose of the improvement of user experiences. Frequency estimation is a widely used statistical tool, which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods based on differential privacy have been proposed, which require either a large user base or a trusted server. Although the requirements for such solutions may not be a problem for larger companies, they may be unattainable for smaller organizations. To address this issue, we propose a distributed privacy-preserving sampling-based frequency estimation method which has high accuracy even in the scenario with a small number of users while not requiring any trusted server. This is achieved by combining multi-party computation and sampling techniques. We also provide a relation between its privacy guarantee, output accuracy, and the number of participants. Distinct from most existing methods, our methods achieve <italic>centralized</italic> differential privacy guarantee without the need of any trusted server. We established that, even for a small number of participants, our mechanisms can produce estimates with high accuracy and hence they provide smaller companies with more opportunity for growth through privacy-preserving statistical analysis. We further propose an architectural model to support weighted aggregation in order to achieve a higher accuracy estimate to cater for users with varying privacy requirements. Compared to the unweighted aggregation, our method provides a more accurate estimate. Extensive experiments are conducted to show the effectiveness of the proposed methods.