Federated learning study

With the rise of data-driven applications and services, concerns surrounding data privacy, especially concerning sensitive information such as personal opinions and sentiments in textual data, have become increasingly prevalent. Traditional Machine Learning methods often necessitate centralising dat...

全面介紹

Saved in:
書目詳細資料
主要作者: Tan, Jun Wei
其他作者: Jun Zhao
格式: Final Year Project
語言:English
出版: Nanyang Technological University 2024
主題:
在線閱讀:https://hdl.handle.net/10356/175325
標簽: 添加標簽
沒有標簽, 成為第一個標記此記錄!
機構: Nanyang Technological University
語言: English
實物特徵
總結:With the rise of data-driven applications and services, concerns surrounding data privacy, especially concerning sensitive information such as personal opinions and sentiments in textual data, have become increasingly prevalent. Traditional Machine Learning methods often necessitate centralising data from various sources for model training, posing significant privacy risks as raw data must be shared or pooled into a single repository. Federated Learning (FL) emerges as a promising solution to this privacy challenge by facilitating collaborative model training across decentralised data sources. Federated Learning enables multiple parties to train a shared Machine Learning model without the need to exchange raw data, thereby preserving data privacy while harnessing the collective intelligence inherent in diverse datasets. This decentralised approach not only enhances privacy but also provides scalability and robustness by distributing computation and storage burdens. This abstract delves into the concept of Federated Learning, highlighting its significance in addressing data privacy concerns while fostering collaborative model training across decentralised environments. In this project, the efficacy of Federated Learning is demonstrated through the utilisation of three diverse datasets sourced from Kaggle, comprising Amazon reviews, IMDB reviews, and Spotify reviews. Initially, all datasets are aggregated into a unified dataset, facilitating the training and evaluation of a text sentiment classification model. Subsequently, employing a Federated Learning approach, the three datasets are distributed across separate clients for model training. The performance of various FL algorithms is evaluated to assess their effectiveness in preserving privacy while maintaining model performance. By comparing the performance of these models trained on decentralised data sources, insights into the potential of Federated Learning in preserving privacy and achieving robust model performance across heterogeneous datasets are garnered.