Efficient and privacy-preserving feature importance-based vertical federated learning
Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes featu...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | English |
Published: |
2024
|
Subjects: | |
Online Access: | https://hdl.handle.net/10356/179064 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Institution: | Nanyang Technological University |
Language: | English |
id |
sg-ntu-dr.10356-179064 |
---|---|
record_format |
dspace |
institution |
Nanyang Technological University |
building |
NTU Library |
continent |
Asia |
country |
Singapore Singapore |
content_provider |
NTU Library |
collection |
DR-NTU |
language |
English |
topic |
Computer and Information Science Artificial intelligence Federated learning |
spellingShingle |
Computer and Information Science Artificial intelligence Federated learning Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang Efficient and privacy-preserving feature importance-based vertical federated learning |
description |
Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients’ training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner. |
author2 |
College of Computing and Data Science |
author_facet |
College of Computing and Data Science Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang |
format |
Article |
author |
Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang |
author_sort |
Li, Anran |
title |
Efficient and privacy-preserving feature importance-based vertical federated learning |
title_short |
Efficient and privacy-preserving feature importance-based vertical federated learning |
title_full |
Efficient and privacy-preserving feature importance-based vertical federated learning |
title_fullStr |
Efficient and privacy-preserving feature importance-based vertical federated learning |
title_full_unstemmed |
Efficient and privacy-preserving feature importance-based vertical federated learning |
title_sort |
efficient and privacy-preserving feature importance-based vertical federated learning |
publishDate |
2024 |
url |
https://hdl.handle.net/10356/179064 |
_version_ |
1814047261487792128 |
spelling |
sg-ntu-dr.10356-1790642024-07-18T02:16:49Z Efficient and privacy-preserving feature importance-based vertical federated learning Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang College of Computing and Data Science School of Computer Science and Engineering Computer and Information Science Artificial intelligence Federated learning Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients’ training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner. Agency for Science, Technology and Research (A*STAR) AI Singapore Nanyang Technological University National Research Foundation (NRF) Submitted/Accepted version This work was supported in part by Nanyang Technological University (NTU), under Grant 020724-00001, in part by the National Research Foundation, Prime Ministers Office, National Cybersecurity R&D Program under Grant NRF2018NCR-NCR005-0001, in part by NRF Investigatorship under Grant NRF-NRFI06-2020-0001, in part by the National Research Foundation, Singapore and DSO National Laboratories under the AI Singapore Programme AISG under Grant AISG2-RP-2020-019, in part by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI) under Grant Alibaba-NTU-AIR2019B1, NTU, Singapore, in part by the RIE 2020 Advanced Manufacturing and Engineering Programmatic Fund under Grant A20G8b0102, Singapore, in part by the National Key R&D Program of China under Grant 2021YFB2900103, in part by China National Natural Science Foundation under Grant 61932016, and in part by “the Fundamental Research Funds for the Central Universities” under Grant WK2150110024. 2024-07-18T02:16:49Z 2024-07-18T02:16:49Z 2023 Journal Article Li, A., Huang, J., Jia, J., Peng, H., Zhang, L., Tuan, L. A., Yu, H. & Li, X. (2023). Efficient and privacy-preserving feature importance-based vertical federated learning. IEEE Transactions On Mobile Computing, 23(6), 7238-7255. https://dx.doi.org/10.1109/TMC.2023.3333879 1536-1233 https://hdl.handle.net/10356/179064 10.1109/TMC.2023.3333879 6 23 7238 7255 en AISG2-RP-2020-019 A20G8b0102 020724-00001 NRF2018NCR-NCR005-0001 NRF-NRFI06-2020-0001 Alibaba-NTU-AIR2019B1 IEEE Transactions on Mobile Computing © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/TMC.2023.3333879. application/pdf |