Machine learning for industrial IOT

Within trusted silos, data sharing may be permitted. The trade-off between running FedAvg and data sharing is largely unexplored in various contexts. This paper’s goal is to perform an exhaustive search to explore the best option in improving communication efficiency by maximizing inference accurac...

Full description

Saved in:
Bibliographic Details
Main Author: Yeow, Brandon Wei Liang
Other Authors: Anupam Chattopadhyay
Format: Final Year Project
Language:English
Published: Nanyang Technological University 2023
Subjects:
Online Access:https://hdl.handle.net/10356/166064
Tags: Add Tag
No Tags, Be the first to tag this record!
Institution: Nanyang Technological University
Language: English
Description
Summary:Within trusted silos, data sharing may be permitted. The trade-off between running FedAvg and data sharing is largely unexplored in various contexts. This paper’s goal is to perform an exhaustive search to explore the best option in improving communication efficiency by maximizing inference accuracy and reducing communication costs. Various data set distributions across clients are induced through data augmentation techniques and sharding by labels. Contexts that we tested include client skew, client count, i.i.d clients, pathological non-i.i.d clients, non-pathological non-i.i.d clients, various stages of training through pre-trained backbones, size of networks through varied backbones, and synthetic data generation. We concluded that running successive rounds of FedAvg is key, but sharing data results in a higher accuracy at each epoch in almost all contexts. This comes with a trade-off of higher bandwidth cost and local training times. Quantity skew impacts models only if there is too little or too much data. This paper also explores data-level privacy techniques using generative models such as Variational Autoencoder. Visual quality of generated images shows little impact on the increase in accuracy. Synthetic data, regenerated or sampled, show a significant improvement over simply sharing data. The insights are consolidated in a table to prescribe the best decision to take in various scenarios. This paper concludes by proposing an algorithm for peer-to-peer Federated Learning where clients search for peers up to the th degree and perform the best actions with peers under bandwidth constraints. The algorithm preferentially chooses to run FedAvg to reduce bandwidth cost unless historically this peer has not provided significant improvement. Sharing data is done when there is little improvement left for FedAvg to achieve and we have sufficient bandwidth for it.