FedCIO: efficient exact federated unlearning with clustering, isolation, and one-shot aggregation

Data are invaluable in machine learning (ML), yet they raise significant privacy concerns. In the real world, data are often distributed across isolated silos, challenging conventional ML methods that centralize data. Federated learning (FL) offers a privacy-preserving solution that enables learning...

Full description

Saved in:
Bibliographic Details
Main Authors: Qiu, Hongyu, Wang, Yongwei, Xu, Yonghui, Cui, Lizhen, Shen, Zhiqi
Other Authors: School of Computer Science and Engineering
Format: Conference or Workshop Item
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/173926
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Data are invaluable in machine learning (ML), yet they raise significant privacy concerns. In the real world, data are often distributed across isolated silos, challenging conventional ML methods that centralize data. Federated learning (FL) offers a privacy-preserving solution that enables learning without direct data transfer. Meanwhile, the 'right to be forgotten' sparks privacy-preserving methods from another viewpoint as machine unlearning, enabling data owners to erase specific data contributions from ML models. However, the invisibility of data in FL scenarios complicates effective local data removal, necessitating tailored unlearning algorithms for FL. Existing federated unlearning methods fall into approximate unlearning, leaving residual memorization of target data, consequently diminishing user trust. To bridge this gap, we propose FedCIO, a novel framework for exact federated unlearning, designed to efficiently manage precise data removal requests in FL scenarios. Specifically, the framework involves client clustering, isolation among clusters, and one-shot aggregation of cluster models. This framework facilitates efficient unlearning by retraining only a relevant model subset rather than from scratch. To enhance the capability to handle Non-Independent and Identically Distributed (Non-IID) data, we further introduce an advanced spectral clustering implementation based on model similarity for better cluster partitioning. Comprehensive evaluation across common FL datasets with varied distributions demonstrates the superior performance of our proposed framework.