Chapter 9. Scalable Inference Serving on Cloud with TensorFlow Serving and KubeFlow

Imagine this: you just built a top-notch classifier. Your goal, as the Silicon Valley motto goes, is to “make the world a better place,” which you’re going to do... with a spectacular Dog/Cat classifier. You have a solid business plan and you cannot wait to pitch your magical classifier to that venture capital firm next week. You know that the investors will question you about your cloud strategy, and you need to show a solid demo before they even consider giving you the money. How would you do this? Creating a model is half the battle, serving it is the next challenge, often the bigger one. In fact, for a long time it was common for training a model to only take a few weeks, but trying to serve it to a larger group of people was a months-long battle, often involving backend engineers and DevOps teams.

In this chapter, we answer a few questions that tend to come up in the context of hosting and serving custom-built models.

  • How can I host my model on my personal server so that my coworkers can play with it?

  • I am not a backend/infrastructure engineer, but I want to make my model available so that it can serve thousands (or even millions) of users. How can I do this at a reasonable price without worrying about scalability and reliability issues?

  • There are reasons (such as cost, regulations, privacy, etc.) why I cannot host my model on the cloud, but only on-premises (my work network). Can I ...

Get Practical Deep Learning for Cloud, Mobile, and Edge now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.