Chapter 3. Scalable AI
Now that you have a prototype ready, how do you make it available to your users efficiently and flexibly? Before starting this journey, consider that deployment can look different across organizations. Practices can vary considerably, even within the same organization. In this chapter, we will cover several patterns for LLM application deployment, which will be independent of which specific choices you make. But keep in mind that during deployment, you will have to spend time considering various options and their pros and cons before making decisions.
From Prototype to Production
Let’s start by talking about why deploying and scaling applications to production involves a fundamental shift in approach. Before going into the industry, I spent a long time in academia doing research and publishing papers. In research, it is important to develop models and pipelines that do something fundamentally different from what is already described in the literature. To this end, you need to solidify your use case around data that is most likely static, and how you use this data and build models ultimately determines how good your research paper is. The same approach tends to work well while building your prototype and convincing stakeholders that it is a valid approach to solving the problem at hand. In Chapter 2, we saw how to make and evaluate various component choices while building RAG application prototypes. Building a prototype usually involves engaging either ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access