Privacy preserving association rule mining

With the growing advancement in technology, amount of data generated is constantly increasing thus leading to the need for data mining technologies to mine valid patterns and relationships in large data sets. In connection with this dramatic increase in data and the popularity of data mining, issues...

Full description

Saved in:
Bibliographic Details
Main Author: Suruchi Sharma.
Other Authors: Ng Wee Keong
Format: Final Year Project
Language:English
Published: 2009
Subjects:
Online Access:http://hdl.handle.net/10356/16919
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-16919
record_format dspace
spelling sg-ntu-dr.10356-169192023-03-03T20:28:50Z Privacy preserving association rule mining Suruchi Sharma. Ng Wee Keong School of Computer Engineering Centre for Advanced Information Systems DRNTU::Engineering::Computer science and engineering::Information systems With the growing advancement in technology, amount of data generated is constantly increasing thus leading to the need for data mining technologies to mine valid patterns and relationships in large data sets. In connection with this dramatic increase in data and the popularity of data mining, issues about privacy preservation have become a great concern. Through this report, I intend to understand privacy preserving mining of association rules and to compare and contrast two randomization approaches to privacy preservation, namely cut‐andpaste randomization and MASK. Firstly, I looked at the process of data mining and its various classes like clustering, classification, prediction and association rule mining. I then looked at association rule mining in greater detail and described the Apriori algorithm for finding frequent itemsets. Following this, I looked at the techniques used by cut‐and‐paste randomization operator and MASK scheme to ensure privacy of the data bring used while accurately mining frequent itemsets from a set of randomized transactions. I implemented cut‐and‐paste and MASK in java using the client‐server architecture for communication in order to investigate their performance in terms of accuracy while maintaining privacy. I conducted several experimentations on the two schemes and found out that at 50% privacy levels, cut‐and‐paste randomization performed slightly better than MASK. However, since the difference in the results was not that that big, I concluded that both schemes performed equally well. I then pointed out certain limitations of the two schemes and explained the condition where these schemes were able to perform well. Bachelor of Engineering (Computer Engineering) 2009-05-29T01:43:47Z 2009-05-29T01:43:47Z 2009 2009 Final Year Project (FYP) http://hdl.handle.net/10356/16919 en Nanyang Technological University 63 pages 62 p. application/pdf
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic DRNTU::Engineering::Computer science and engineering::Information systems
spellingShingle DRNTU::Engineering::Computer science and engineering::Information systems
Suruchi Sharma.
Privacy preserving association rule mining
description With the growing advancement in technology, amount of data generated is constantly increasing thus leading to the need for data mining technologies to mine valid patterns and relationships in large data sets. In connection with this dramatic increase in data and the popularity of data mining, issues about privacy preservation have become a great concern. Through this report, I intend to understand privacy preserving mining of association rules and to compare and contrast two randomization approaches to privacy preservation, namely cut‐andpaste randomization and MASK. Firstly, I looked at the process of data mining and its various classes like clustering, classification, prediction and association rule mining. I then looked at association rule mining in greater detail and described the Apriori algorithm for finding frequent itemsets. Following this, I looked at the techniques used by cut‐and‐paste randomization operator and MASK scheme to ensure privacy of the data bring used while accurately mining frequent itemsets from a set of randomized transactions. I implemented cut‐and‐paste and MASK in java using the client‐server architecture for communication in order to investigate their performance in terms of accuracy while maintaining privacy. I conducted several experimentations on the two schemes and found out that at 50% privacy levels, cut‐and‐paste randomization performed slightly better than MASK. However, since the difference in the results was not that that big, I concluded that both schemes performed equally well. I then pointed out certain limitations of the two schemes and explained the condition where these schemes were able to perform well.
author2 Ng Wee Keong
author_facet Ng Wee Keong
Suruchi Sharma.
format Final Year Project
author Suruchi Sharma.
author_sort Suruchi Sharma.
title Privacy preserving association rule mining
title_short Privacy preserving association rule mining
title_full Privacy preserving association rule mining
title_fullStr Privacy preserving association rule mining
title_full_unstemmed Privacy preserving association rule mining
title_sort privacy preserving association rule mining
publishDate 2009
url http://hdl.handle.net/10356/16919
_version_ 1759853363361480704