Appendix C. Using Model Serving in Applications

In Chapter 8 you learned different approaches for exposing model servers provided by Kubeflow. As described there, Kubeflow provides several ways of deploying trained models and providing both REST and gRPC interfaces for running model inference. However, it falls short in providing support for using these models in custom applications. Here we will present some of the approaches to building applications by leveraging model servers exposed by Kubeflow.

When it comes to applications leveraging model inference, they can be broadly classified into two categories: real time and batch applications. In the real time/stream applications model, inference is done on data directly as it is produced or received. In this case, typically only one request is available at a time and it can be used for inferencing as it arrives. In the batch scenarios all of the data is available up front and can be used for inference either sequentially or in parallel. We will start from the streaming use case and then take a look at possible batch implementations.

Building Streaming Applications Leveraging Model Serving

The majority of today’s streaming applications leverage Apache Kafka as the data backbone of a system. The two possible options for implementing streaming applications themselves are: usage of stream processing engines and usage of stream processing libraries.

Stream Processing Engines and Libraries

As defined in the article “Defining the Execution ...

Get Kubeflow for Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.