Development of Efficient Privacy-Preservation Algorithms

Privacy preservation is one of the important issues that obtain a lot of attention in society. When the collaboration is to be taking place among partners for obtaining the useful knowledge to achieve a good strategic move, the privacy preservation is a necessity for prevent the privacy breach at al...

Full description

Saved in:
Bibliographic Details
Main Author: Bowosak Srisungsittisunti
Other Authors: Assoc. Prof. Dr. Juggapong Natwichai
Format: Theses and Dissertations
Language:English
Published: เชียงใหม่ : บัณฑิตวิทยาลัย มหาวิทยาลัยเชียงใหม่ 2020
Online Access:http://cmuir.cmu.ac.th/jspui/handle/6653943832/69306
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Chiang Mai University
Language: English
Description
Summary:Privacy preservation is one of the important issues that obtain a lot of attention in society. When the collaboration is to be taking place among partners for obtaining the useful knowledge to achieve a good strategic move, the privacy preservation is a necessity for prevent the privacy breach at all cost. Though, there exist several privacy preservation models currently. In this research, the problem of data privacy preservation based on a prominent privacy model, (k, e)-Anonymous, is addressed. The target data processing which can be applied to the data from the model is aggregated data querying, which is a fundamental data processing of many data analysis and data mining algorithms. However, when a new dataset is to be released, there may be, at the same time, datasets that were released elsewhere, a problem arises because some attackers might obtain multiple versions of the same dataset and compare them with the newly released dataset. Although the privacy of all of the datasets has been well-preserved individually, such a comparison can lead to a privacy breach, which is a so-called “incremental privacy breach”. To address this problem effectively, we first study the characteristics of the effects of multiple dataset releases with a theoretical approach. It has been found that a privacy breach that is subjected to an increment occurs when there is overlap between any parts of the new dataset with any parts of an existing dataset. Based on our proposed studies, a polynomial-time algorithm is proposed. This algorithm needs to consider only one previous version of the dataset, and it can also skip computing the overlapping partitions. Thus, the computational complexity of the proposed algorithm is reduced from O(nm) to only O(pn3) where p is the number of partitions, n is the number of tuples, and m is the number of released datasets. At the same time, the privacy of all of the released datasets as well as the optimal solution can be always guaranteed. In addition, experiment results that illustrate the efficiency of our algorithm on real-world datasets are presented.