Chapter 4. Open Problems and Future Directions

The field of federated learning is an open area of research. This ever-changing landscape pushes the frontiers of what is possible for privacy-preserving machine learning. In this chapter, we will describe some of the open areas of research, providing a glimpse of where this field might head in the future. Finally, we will cover software that is currently available to use to experiment with federated learning.

Open Research Problems

Federated learning is still a nascent technology. As such, there are many open questions for researchers and practitioners to explore. As federated learning builds on advances in many different disciplines, solving these open problems will require collaboration from contributors with a wide array of expertise. Many of the ideas in this section are drawn from the survey paper “Advances and Open Problems in Federated Learning” (Kairouz et al., 2021), which is a collaboration by a large number of researchers to survey the state of the federated learning discipline.

Heterogeneity

One of the classic assumptions in machine learning is that training examples are randomly placed throughout the dataset; or, more formally, that the data is “independent and identically distributed” (i.i.d.) However, in federated learning, this assumption no longer holds. In the case where clients are devices belonging to end users, for example, the training examples are not randomly shuffled but reside on the individual user’s device. Different users may have unique characteristics that lead the local models trained on the different devices to diverge. For example, with emoji prediction, one user might express congratulations using the party hat emoji, while another might prefer the popping bottle of champagne.

Such differences can be due to personal preference or to regional variation, which introduces another set of challenges. So that training doesn’t impact a user’s ability to operate their device, devices must meet a set of training conditions to participate. These commonly include being idle, charging, and being on WiFi—conditions that are most commonly met at night, when the user charges their device while they sleep. This leads to the set of examples varying not only across different devices but over time, as users in different time zones plug in their phones and go to bed. Additionally, there may be correlations between the types of examples a particular device holds and whether that device is able to participate in model training. For example, some devices might not have enough memory, or their software may be too out of date to participate.

Looking beyond the heterogeneity of data, there is heterogeneity in other characteristics of clients, such as computational speed and availability to participate. All of these types of heterogeneity need to be understood when designing training algorithms.

The heterogeneity inherent in the federated learning setting leads to open questions in multiple areas of machine learning. The machine learning research subfield of optimization deals with questions of whether and how quickly a model will converge to an optimal set of parameters via the training process. Studies of optimization have classically relied on the i.i.d. assumption in order to prove convergence, and thus optimization in the federated setting (where this assumption does not hold) was not well understood. Recently, there has been progress in placing the optimization algorithms used in federated learning, such as Federated Averaging, on a more firm theoretical foundation (Kairouz et al., 2021), rather than just observing that they perform well empirically. By bringing the analysis of algorithms more in line with the real-world data distributions and training processes for federated learning, optimization researchers may be able to develop algorithms for federated learning that achieve higher accuracy and take less time to train in the real world. For analysis and practical recommendations of federated optimization algorithms, see “A Field Guide to Federated Optimization” (Wang et al., 2021).

A related area of research is personalization. The idea of personalization is that it may be possible to turn the fact that each user has a slightly different data distribution into an advantage—the model can be fine-tuned on each user’s device to perform better for that particular user. However, there are many open questions in this space. For example, when is a personalized model better than a global model? Are there ways to train the global model using federated learning that make that model more amenable to personalization?

Bias and Fairness

The fact that some devices can more easily participate in training than others may also lead to bias in the training data that will negatively impact the fairness of the model. In other words, the final model may end up performing better for those users who have the newest, most powerful phones and the most reliable access to the internet. Many existing strategies for measuring and correcting for fairness rely on knowledge of demographic data about the participating users, which comes into conflict with the privacy principles of federated learning. Better understanding of the trade-offs between privacy and fairness could enable new measurements and techniques to be developed that would lead to models trained with good privacy properties that also perform fairly for all users. For example, fairness could be redefined in terms of differences in model performance across users, rather than measurements of fairness relying on knowledge of demographic data. In parallel, more work is needed on how to design the platforms that run federated training to allow as wide a variety of devices to participate as possible. For a good survey of fairness in federated learning, see section 7.3 of “A Field Guide to Federated Optimization” (Wang et al., 2021).

Privacy and Trust

How can we further ensure user privacy is protected when models are trained on user data with federated learning, and how do we determine that the models produced by federated learning can be trusted? Part of this research involves understanding what types of protections we want to offer, as the right defenses differ depending on the adversary and the data the adversary can access.

Data minimization

Recall from Chapter 2 the overarching principle of data minimization: collecting as little data as possible, aggregating it as soon as possible, and discarding it as soon as possible. Federated learning is a technology than can minimize the data needed to perform a computation, and the use of secure aggregation can further minimize the amount of meaningful data seen by actors in the system. Still, there is room to keep improving data minimization.

Secure aggregation can be used to make the messages that the server receives indistinguishable from random noise until they are aggregated. However, recall that user devices participating in federated learning have no way to communicate with each other directly. Their communication must be mediated by the server. Once each client has the public key of every other client, it can communicate with the other clients securely even though the messages pass through the server, because the server cannot decrypt these messages. However, since the key exchange itself must be mediated by the server, there is an opportunity for a malicious server to execute a man-in-the-middle attack, providing its own public key to clients rather than the public key of the client they want to communicate with. This means that secure aggregation as currently defined can protect against an “honest-but-curious” server that follows the protocol but may try to inspect updates, but not a “malicious” server that could deviate from the agreed-upon protocol. Thus, an open direction for research is how to further limit the trust that must be placed in the server.

One promising idea is to use advances in hardware that make it possible to run code in a secure yet verifiable manner. A trusted execution environment might enable code to be run in such a way that the state cannot be inspected or altered once the program begins. At the same time, the environment can prove to an outside observer that it is running a particular piece of software with a particular initial state. One could imagine the code that runs secure aggregation running in a trusted execution environment. There are many challenges to making this work in practice however, such as determining how to prevent side-channel attacks and designing and implementing a fully featured federated learning system of which one component is a trusted execution environment.

Data anonymization

In Chapter 2 we discussed how the aim of data anonymization technologies, such as differential privacy, is to prevent user data from being reconstructed from the final model.

One line of research in this area, known as empirical model auditing, regards experimentally measuring how much user data is memorized in a federated learning process. There has been recent success in quantifying model memorization by injecting canary examples into real training data during federated training and calculating the impact of the canary examples on the final model (Thakkar et al., 2020). An open area of research is extending such empirical techniques to measure the strength of FL models against different deanonymization techniques.

Recall that adding noise to the model can obscure the data of individuals in the final model. Currently, most strategies for adding noise to a model require the noise to be added on the server side. It would be even better if clients could add noise themselves and thus place less trust in the server. The natural idea would be for each client to add enough noise to protect its own update; however, in practice this greatly decreases the accuracy of the final model. Thus, researchers are exploring models of distributed differential privacy (Kairouz, Liu, and Steinke, 2021), in which a lesser amount of noise is added by each client in combination with multi-party computation protocols such as secure aggregation or secure shuffling that amplify the privacy guarantees.

Robustness

In order to trust the final model, we would also like the training process to have the property of robustness, or resilience to corrupt inputs. An attacker providing corrupt inputs may have the goal of lowering the overall model accuracy, or changing the behavior of the model on a particular class of inputs. Defending against such attacks on model performance is also important in traditional distributed datacenter machine learning, but in the federated setting, adversaries may have additional capabilities, such as the ability to inspect intermediate model updates and adaptively modify their future contributions. This is due to the fact that clients, especially in the cross-silo setting, participate in multiple rounds of training. In each round, clients receive the updated global model and thus can evaluate the effect of their previous contributions on the updated model. Further complicating the ability to be resilient to these attacks is the fact that many protection strategies rely on inspection of model updates. Thus, additional strategies are needed to improve robustness in the federated setting.

Practical Considerations

How can we make it easier for machine learning engineers to develop models using federated learning? The modeler’s workflow involves trying out different models with different settings (also known as hyperparameters) for how the model should be trained, iteratively approaching the best settings. This process can become expensive in a federated learning setting when training a single model can take days. This is one reason that running federated training in a simulated environment can be useful, but more research is needed to determine the best means of doing hyperparameter search in a federated setting.

Additionally, in a centralized setting, modelers may examine training examples when the model is not performing well. In the federated setting, this is impossible by design. Recently, a strategy was developed to train a model that can produce simulated training examples that match the real distribution of data, without reproducing any real examples (Augenstein et al., 2019). More research in this direction is needed to supplement the modeler’s debugging toolbox in the absence of the ability to directly inspect data.

Another practical concern is how to best communicate to the public the ways in which federated learning preserves privacy, while being clear about the limitations of a particular deployment under certain threats. Additionally, more work is needed to understand what privacy guarantees are most important to users in the model development process. This will be critical to determining what privacy technologies to invest in and what trade-offs are acceptable between privacy and utility.

Software for Federated Learning

With the rapid growth of the federated learning field, there are a growing number of options for federated learning frameworks.

Some software libraries are focused on facilitating open research and experimentation with federated learning. TensorFlow Federated (TFF) is an open source framework that can simulate federated learning algorithms on included or user-defined models and data. The building blocks provided by TFF can also be used to implement federated analytics algorithms. Another example of research-oriented federated learning software is PySyft, a Python library for secure and private learning. PySyft supports federated learning, differential privacy, and encrypted computation. Other research-oriented frameworks include FedML, Sherpa.ai, LEAF, and PyVertical.

Other software is being developed for production deployments of federated learning, including FATE (Federated AI Technology Enabler) and PaddleFL. The Clara training framework, IBM Federated Learning, and Fedlearner are particularly focused on the cross-silo setting, while the Flower framework focuses on the cross-device setting.

References

Get What Is Federated Learning? now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.