Toward highly secure yet efficient KNN classification scheme on outsourced cloud data

Nowadays, outsourcing data and machine learning tasks, e.g., $k$ -nearest neighbor (KNN) classification, to clouds has become a scalable and cost-effective way for large scale data storage, management, and processing. However, data security and privacy issue have been a serious concern in outsourcin...

Full description

Saved in:
Bibliographic Details
Main Authors: LIU, Lin, SU, Jinshu, LIU, Ximeng, CHEN, Rongmao, HUANG, Kai, DENG, Robert H., WANG, Xiaofeng
Format: text
Language:English
Published: Institutional Knowledge at Singapore Management University 2019
Subjects:
Online Access:https://ink.library.smu.edu.sg/sis_research/4672
https://doi.org/10.1109/JIOT.2019.2932444
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Singapore Management University
Language: English
Description
Summary:Nowadays, outsourcing data and machine learning tasks, e.g., $k$ -nearest neighbor (KNN) classification, to clouds has become a scalable and cost-effective way for large scale data storage, management, and processing. However, data security and privacy issue have been a serious concern in outsourcing data to clouds. In this article, we propose a privacy-preserving KNN classification scheme on cloud data in a twin-cloud model based on an additively homomorphic cryptosystem and secret sharing. Compared with existing works, we redesign a set of lightweight building blocks, such as secure square Euclidean distance, secure comparison, secure sorting, secure minimum, and maximum number finding, and secure frequency calculating, which achieve the same security level but with higher efficiency. In our scheme, data owners stay offline, which is different from secure-multiparty computation-based solutions which require data owners’ stay online during computation. In addition, query users do not interact with the cloud except sending query data and receiving the query results. Our security analysis shows that the scheme protects outsourced data security and query privacy, and hides access patterns. The experiments on real-world dataset indicate that our scheme is significantly more efficient than existing schemes.