Balancing data utility versus information loss in data-privacy protection using k-Anonymity

Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a p...

全面介紹

Saved in:
書目詳細資料
Main Authors: Esmeel, Thamer Khalil, Hasan, Md Munirul, Kabir, Muhammad Nomani, Ahmad, Firdaus
格式: Conference or Workshop Item
語言:English
出版: IEEE
主題:
在線閱讀:http://umpir.ump.edu.my/id/eprint/31545/1/Balancing%20Data%20Utility%20versus%20Information%20Loss%20in.pdf
http://umpir.ump.edu.my/id/eprint/31545/
http://10.1109/ICSPC50992.2020.9305776
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Universiti Malaysia Pahang Al-Sultan Abdullah
語言: English
實物特徵
總結:Data privacy has been an important area of research in recent years. Dataset often consists of sensitive data fields, exposure of which may jeopardize interests of individuals associated with the data. In order to resolve this issue, privacy techniques can be used to hinder the identification of a person through anonymization of the sensitive data in the dataset to protect sensitive information, while the anonymized dataset can be used by the third parties for analysis purposes without obstruction. In this research, we investigated a privacy technique, k-anonymity for different values of k on different number c of columns of the dataset. Next, the information loss due to k-anonymity is computed. The anonymized files go through the classification process by some machine-learning algorithms i.e., Naive Bayes, J48 and neural network in order to check a balance between data anonymity and data utility. Based on the classification accuracy, the optimal values of k and c are obtained, and thus, the optimal k and c can be used for kanonymity algorithm to anonymize optimal number of columns of the dataset.