Beginners Guide to Federated Learning.
What is Federated Learning?
Federated learning is a machine learning technique that collaboratively trains an algorithm across multiple decentralized edge devices (desktops, mobiles, tablets, IOT devices, etc.) or servers (data silos) holding local data samples, without the need for sharing the data or centrally aggregating them.
This approach is in contrast to classical Centralized ML techniques where all the datasets residing on edge devices are uploaded to one central server. Federated Learning also differs from Distributed ML, where the expectation is that the local datasets will be identically distributed with more or less constant availability of the nodes.
Now lets look into the steps involved in federated learning in detail:
- A model instance is running in each of the device.
- Once the training is complete on the local data, the device connects to the server and the model weights/parameters are sent to the central server.
- The central server aggregates the weight parameters and sends the updated weights back to each device. (Thus ensuring that data never leaves the device and we get the benefit of training on the data from all the devices.)
- Once the devices receive the updated parameters, they can now use the updated model parameters for their individual tasks.
Some of the popular federated Learning frameworks available which you can use to build your own federated learning framework are -
PySyft by OpenMined
Limitations of Federated Learning
Federated Learning enables us to keep the data at source during model training, thereby resolving concerns of sharing private data to a great extent and preserving privacy at the outset. In essence, the Federated Averaging algorithm involves the following steps,
- The client devices run multiple steps of Stochastic Gradient Descent (SGD) on their local data to compute an update.
- Server computes an overall update using a simple weighted average of the client updates.
There have been many advances in the area of federated learning, but there are some challenges that need to be addressed.
- There exist ways in which local model updates can be reverse-engineered by an eavesdropper or at the aggregating server, via Model-Inversion or Membership-Inference attacks. Since Federated Learning deals with individual model updates sent by end-users, inversion of the difference between the original model and the updated model, can potentially reveal a lot of sensitive information.
- Federated Learning doesn’t guarantee the resultant aggregated model will be “better” than the individual models.
- Federated Learning, by design, is not impervious to Data Poisoning attacks, where a compromised end-user sends corrupted model updates.
- Popular approaches of Federated Learning involve only a fraction of the User Pool for training, therefore utmost care needs to be taken to ensure Fairness is achieved.