Efficient and privacy-preserving feature importance-based vertical federated learning

Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes featu...

Full description

Saved in:
Bibliographic Details
Main Authors: Li, Anran, Huang, Jiahui, Jia, Ju, Peng, Hongyi, Zhang, Lan, Tuan, Luu Anh, Yu, Han, Li, Xiang-Yang
Other Authors: College of Computing and Data Science
Format: Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/179064
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
id sg-ntu-dr.10356-179064
record_format dspace
institution Nanyang Technological University
building NTU Library
continent Asia
country Singapore
Singapore
content_provider NTU Library
collection DR-NTU
language English
topic Computer and Information Science
Artificial intelligence
Federated learning
spellingShingle Computer and Information Science
Artificial intelligence
Federated learning
Li, Anran
Huang, Jiahui
Jia, Ju
Peng, Hongyi
Zhang, Lan
Tuan, Luu Anh
Yu, Han
Li, Xiang-Yang
Efficient and privacy-preserving feature importance-based vertical federated learning
description Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients’ training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner.
author2 College of Computing and Data Science
author_facet College of Computing and Data Science
Li, Anran
Huang, Jiahui
Jia, Ju
Peng, Hongyi
Zhang, Lan
Tuan, Luu Anh
Yu, Han
Li, Xiang-Yang
format Article
author Li, Anran
Huang, Jiahui
Jia, Ju
Peng, Hongyi
Zhang, Lan
Tuan, Luu Anh
Yu, Han
Li, Xiang-Yang
author_sort Li, Anran
title Efficient and privacy-preserving feature importance-based vertical federated learning
title_short Efficient and privacy-preserving feature importance-based vertical federated learning
title_full Efficient and privacy-preserving feature importance-based vertical federated learning
title_fullStr Efficient and privacy-preserving feature importance-based vertical federated learning
title_full_unstemmed Efficient and privacy-preserving feature importance-based vertical federated learning
title_sort efficient and privacy-preserving feature importance-based vertical federated learning
publishDate 2024
url https://hdl.handle.net/10356/179064
_version_ 1814047261487792128
spelling sg-ntu-dr.10356-1790642024-07-18T02:16:49Z Efficient and privacy-preserving feature importance-based vertical federated learning Li, Anran Huang, Jiahui Jia, Ju Peng, Hongyi Zhang, Lan Tuan, Luu Anh Yu, Han Li, Xiang-Yang College of Computing and Data Science School of Computer Science and Engineering Computer and Information Science Artificial intelligence Federated learning Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners’ local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients’ training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner. Agency for Science, Technology and Research (A*STAR) AI Singapore Nanyang Technological University National Research Foundation (NRF) Submitted/Accepted version This work was supported in part by Nanyang Technological University (NTU), under Grant 020724-00001, in part by the National Research Foundation, Prime Ministers Office, National Cybersecurity R&D Program under Grant NRF2018NCR-NCR005-0001, in part by NRF Investigatorship under Grant NRF-NRFI06-2020-0001, in part by the National Research Foundation, Singapore and DSO National Laboratories under the AI Singapore Programme AISG under Grant AISG2-RP-2020-019, in part by Alibaba Group through Alibaba Innovative Research (AIR) Program and Alibaba-NTU Singapore Joint Research Institute (JRI) under Grant Alibaba-NTU-AIR2019B1, NTU, Singapore, in part by the RIE 2020 Advanced Manufacturing and Engineering Programmatic Fund under Grant A20G8b0102, Singapore, in part by the National Key R&D Program of China under Grant 2021YFB2900103, in part by China National Natural Science Foundation under Grant 61932016, and in part by “the Fundamental Research Funds for the Central Universities” under Grant WK2150110024. 2024-07-18T02:16:49Z 2024-07-18T02:16:49Z 2023 Journal Article Li, A., Huang, J., Jia, J., Peng, H., Zhang, L., Tuan, L. A., Yu, H. & Li, X. (2023). Efficient and privacy-preserving feature importance-based vertical federated learning. IEEE Transactions On Mobile Computing, 23(6), 7238-7255. https://dx.doi.org/10.1109/TMC.2023.3333879 1536-1233 https://hdl.handle.net/10356/179064 10.1109/TMC.2023.3333879 6 23 7238 7255 en AISG2-RP-2020-019 A20G8b0102 020724-00001 NRF2018NCR-NCR005-0001 NRF-NRFI06-2020-0001 Alibaba-NTU-AIR2019B1 IEEE Transactions on Mobile Computing © 2023 IEEE. All rights reserved. This article may be downloaded for personal use only. Any other use requires prior permission of the copyright holder. The Version of Record is available online at http://doi.org/10.1109/TMC.2023.3333879. application/pdf