Differentially private distributed frequency estimation
In order to remain competitive, Internet companies collect and analyse user data for the purpose of the improvement of user experiences. Frequency estimation is a widely used statistical tool, which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods...
Saved in:
Main Authors: | , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2023
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/168026 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
Summary: | In order to remain competitive, Internet companies collect and analyse user data for the purpose of the improvement of user experiences. Frequency estimation is a widely used statistical tool, which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods based on differential privacy have been proposed, which require either a large user base or a trusted server. Although the requirements for such solutions may not be a problem for larger companies, they may be unattainable for smaller organizations. To address this issue, we propose a distributed privacy-preserving sampling-based frequency estimation method which has high accuracy even in the scenario with a small number of users while not requiring any trusted server. This is achieved by combining multi-party computation and sampling techniques. We also provide a relation between its privacy guarantee, output accuracy, and the number of participants. Distinct from most existing methods, our methods achieve <italic>centralized</italic> differential privacy guarantee without the need of any trusted server. We established that, even for a small number of participants, our mechanisms can produce estimates with high accuracy and hence they provide smaller companies with more opportunity for growth through privacy-preserving statistical analysis. We further propose an architectural model to support weighted aggregation in order to achieve a higher accuracy estimate to cater for users with varying privacy requirements. Compared to the unweighted aggregation, our method provides a more accurate estimate. Extensive experiments are conducted to show the effectiveness of the proposed methods. |
---|