Federated learning study

With the rise of data-driven applications and services, concerns surrounding data privacy, especially concerning sensitive information such as personal opinions and sentiments in textual data, have become increasingly prevalent. Traditional Machine Learning methods often necessitate centralising dat...

Full description

Saved in:
Bibliographic Details
Main Author: Tan, Jun Wei
Other Authors: Jun Zhao
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2024
Subjects:
Online Access:https://hdl.handle.net/10356/175325
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:With the rise of data-driven applications and services, concerns surrounding data privacy, especially concerning sensitive information such as personal opinions and sentiments in textual data, have become increasingly prevalent. Traditional Machine Learning methods often necessitate centralising data from various sources for model training, posing significant privacy risks as raw data must be shared or pooled into a single repository. Federated Learning (FL) emerges as a promising solution to this privacy challenge by facilitating collaborative model training across decentralised data sources. Federated Learning enables multiple parties to train a shared Machine Learning model without the need to exchange raw data, thereby preserving data privacy while harnessing the collective intelligence inherent in diverse datasets. This decentralised approach not only enhances privacy but also provides scalability and robustness by distributing computation and storage burdens. This abstract delves into the concept of Federated Learning, highlighting its significance in addressing data privacy concerns while fostering collaborative model training across decentralised environments. In this project, the efficacy of Federated Learning is demonstrated through the utilisation of three diverse datasets sourced from Kaggle, comprising Amazon reviews, IMDB reviews, and Spotify reviews. Initially, all datasets are aggregated into a unified dataset, facilitating the training and evaluation of a text sentiment classification model. Subsequently, employing a Federated Learning approach, the three datasets are distributed across separate clients for model training. The performance of various FL algorithms is evaluated to assess their effectiveness in preserving privacy while maintaining model performance. By comparing the performance of these models trained on decentralised data sources, insights into the potential of Federated Learning in preserving privacy and achieving robust model performance across heterogeneous datasets are garnered.