Differentially private distributed frequency estimation

In order to remain competitive, Internet companies collect and analyse user data for the purpose of the improvement of user experiences. Frequency estimation is a widely used statistical tool, which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods...

وصف كامل

محفوظ في:
التفاصيل البيبلوغرافية
المؤلفون الرئيسيون: Yang, Mengmeng, Tjuawinata, Ivan, Lam, Kwok-Yan, Zhu, Tianqing, Zhao, Jun
مؤلفون آخرون: School of Computer Science and Engineering
التنسيق: مقال
اللغة:English
منشور في: 2023
الموضوعات:
الوصول للمادة أونلاين:https://hdl.handle.net/10356/168026
الوسوم: إضافة وسم
لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!
المؤسسة: Nanyang Technological University
اللغة: English
الوصف
الملخص:In order to remain competitive, Internet companies collect and analyse user data for the purpose of the improvement of user experiences. Frequency estimation is a widely used statistical tool, which could potentially conflict with the relevant privacy regulations. Privacy preserving analytic methods based on differential privacy have been proposed, which require either a large user base or a trusted server. Although the requirements for such solutions may not be a problem for larger companies, they may be unattainable for smaller organizations. To address this issue, we propose a distributed privacy-preserving sampling-based frequency estimation method which has high accuracy even in the scenario with a small number of users while not requiring any trusted server. This is achieved by combining multi-party computation and sampling techniques. We also provide a relation between its privacy guarantee, output accuracy, and the number of participants. Distinct from most existing methods, our methods achieve <italic>centralized</italic> differential privacy guarantee without the need of any trusted server. We established that, even for a small number of participants, our mechanisms can produce estimates with high accuracy and hence they provide smaller companies with more opportunity for growth through privacy-preserving statistical analysis. We further propose an architectural model to support weighted aggregation in order to achieve a higher accuracy estimate to cater for users with varying privacy requirements. Compared to the unweighted aggregation, our method provides a more accurate estimate. Extensive experiments are conducted to show the effectiveness of the proposed methods.