Chapter 14. Accessing Cloud-Based Models from Mobile Apps

Throughout this book you’ve been creating models and converting them to the TensorFlow Lite format so they could be used within your mobile apps. This works very well for models that you want to use on mobile for the reasons discussed in Chapter 1, such as latency and privacy. However, there may be times when you don’t want to deploy the model to a mobile device—maybe it’s too large or complex for mobile, maybe you want to update it frequently, or maybe you don’t want to risk it being reverse-engineered and have your IP used by others.

In those cases you’ll want to deploy your model to a server, perform the inference there, and then have some form of server manage the requests from your clients, invoke the model for inference, and respond with the results. A high-level view of this is shown in Figure 14-1.

Figure 14-1. A high-level look at a server architecture for models

Another benefit of this architecture is in managing model drift. When you deploy a model to devices, you can end up in a situation with multiple models in the wild if people don’t or can’t update their app to get the latest model. Consider then the scenario where you want model drift; perhaps people with more premium hardware can have a bigger and more accurate version of your model, whereas others can get a smaller and slightly less ...

Get AI and Machine Learning for On-Device Development now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.