Efficient and privacy-preserving feature importance-based vertical federated learning

Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes featu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Li, Anran, Huang, Jiahui, Jia, Ju, Peng, Hongyi, Zhang, Lan, Tuan, Luu Anh, Yu, Han, Li, Xiang-Yang
Other Authors:	College of Computing and Data Science
Format:	Article
Language:	English
Published:	2024
Subjects:	Computer and Information Science Artificial intelligence Federated learning
Online Access:	https://hdl.handle.net/10356/179064
Tags:	Add Tag No Tags, Be the first to tag this record!
Institution:	Nanyang Technological University
Language:	English

id	sg-ntu-dr.10356-179064
record_format	dspace
institution	Nanyang Technological University
building	NTU Library
continent	Asia
country	Singapore Singapore
content_provider	NTU Library
collection	DR-NTU
language	English
topic	Computer and Information Science Artificial intelligence Federated learning
spellingShingle	Computer and Information Science Artificial intelligence Federated learning Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang Efficient and privacy-preserving feature importance-based vertical federated learning
description	Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients’ training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner.
author2	College of Computing and Data Science
author_facet	College of Computing and Data Science Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang
format	Article
author	Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang
author_sort	Li, Anran
title	Efficient and privacy-preserving feature importance-based vertical federated learning
title_short	Efficient and privacy-preserving feature importance-based vertical federated learning
title_full	Efficient and privacy-preserving feature importance-based vertical federated learning
title_fullStr	Efficient and privacy-preserving feature importance-based vertical federated learning
title_full_unstemmed	Efficient and privacy-preserving feature importance-based vertical federated learning
title_sort	efficient and privacy-preserving feature importance-based vertical federated learning
publishDate	2024
url	https://hdl.handle.net/10356/179064
_version_	1814047261487792128
spelling	sg-ntu-dr.10356-1790642024-07-18T02:16:49Z Efficient and privacy-preserving feature importance-based vertical federated learning Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang College of Computing and Data Science School of Computer Science and Engineering Computer and Information Science Artificial intelligence Federated learning Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients’ training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner. Agency for Science, Technology and Research (A*STAR) AI Singapore Nanyang Technological University National Research Foundation (NRF) Submitted/Accepted version This work was supported in part by Nanyang Technological University (NTU), under Grant 020724-00001, in part by the National Research Foundation, Prime Ministers Office, National Cybersecurity R&D Program under Grant NRF2018NCR-NCR005-0001, in part by NRF Investigatorship under Grant NRF-NRFI06-2020-0001, in part by the National Research Foundation, Singapore and DSO National Laboratories under the AI Singapore Programme AISG under Grant AISG2-RP-2020-019, in part by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI) under Grant Alibaba-NTU-AIR2019B1, NTU, Singapore, in part by the RIE 2020 Advanced Manufacturing and Engineering Programmatic Fund under Grant A20G8b0102, Singapore, in part by the National Key R&D Program of China under Grant 2021YFB2900103, in part by China National Natural Science Foundation under Grant 61932016, and in part by “the Fundamental Research Funds for the Central Universities” under Grant WK2150110024. 2024-07-18T02:16:49Z 2024-07-18T02:16:49Z 2023 Journal Article Li, A., Huang, J., Jia, J., Peng, H., Zhang, L., Tuan, L. A., Yu, H. & Li, X. (2023). Efficient and privacy-preserving feature importance-based vertical federated learning. IEEE Transactions On Mobile Computing, 23(6), 7238-7255. https://dx.doi.org/10.1109/TMC.2023.3333879 1536-1233 https://hdl.handle.net/10356/179064 10.1109/TMC.2023.3333879 6 23 7238 7255 en AISG2-RP-2020-019 A20G8b0102 020724-00001 NRF2018NCR-NCR005-0001 NRF-NRFI06-2020-0001 Alibaba-NTU-AIR2019B1 IEEE Transactions on Mobile Computing © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/TMC.2023.3333879. application/pdf

Efficient and privacy-preserving feature importance-based vertical federated learning

Similar Items