Large Language Models in Production
Published by O'Reilly Media, Inc.
How to navigate the complexities of deploying and optimizing LLMs in production
Course outcomes
- Learn how to make critical LLM framework decisions
- Understand how to evaluate LLMs
- Learn various options for deploying and monitoring LLMs in production
Course description
Large language models are fundamentally changing the way practitioners integrate AI into applications. LLM adoption looks deceptively simple due to the relatively low barrier for using state-of-the-art models like GPT-4, Llama 2, etc. However, making these systems production ready is still a challenge and requires a grasp of new concepts like prompt engineering, retrieval-augmented generation, hallucinations, and more.
Join expert Skanda Vivek to learn the fundamental concepts for building and deploying real-world LLMs in production.
What you’ll learn and how you can apply it
- Make critical LLM framework decisions
- Evaluate LLMs
- Deploy and monitor LLMs in production
This live event is for you because...
- You’re a developer who’s integrating LLMs into a product.
- You’re a CTO/CDO who wants to integrate LLMs into your business.
- You're a software engineer who wants to learn more about LLMs to upskill or apply for ML engineer jobs.
Prerequisites
- Familiarity with fundamental machine learning concepts (classification and regression, model training and testing, loss functions, backpropagation, etc.)
- Familiarity with software development in Python
- Familiarity with ChatGPT
Recommended preparation:
- Access to ChatGPT (optional for following along with the exercises)
- Read Hands-On Large Language Models (book)
- Read Prompt Engineering for Generative AI (book)
Recommended follow-up:
- Read Implementing MLOps in the Enterprise (book)
- Read Generative AI on AWS (book)
Schedule
The time frames are only estimates and may vary according to how the class is progressing.
LLM design considerations (60 minutes)
- Presentation: Making key LLM decisions; four key decision metrics (response quality, economics, latency, privacy); LLM deployment economics (ChatGPT/GPT-4 versus open source); levers of choice (prompt engineering, RAG, fine-tuning); embeddings and vector databases; RAG; integrating LLM and vector DB (context window size, choosing context, chat history, building comprehensive architectures for handling complex cases); fine-tuning and PEFT (fine-tuning to reduce memory footprint and performance degradation)
- Group discussion: When do you choose RAG versus fine-tuning?
- Hands-on exercise: Explore prompt engineering, RAG, and fine-tuning
- Q&A
- Break
Evaluating LLMs (45 minutes)
- Presentation: Evaluating LLMs; traditional ML eval versus LLM eval; LLM eval datasets; GPT as a judge; evaluating embeddings; eval tools (Langsmith, RAGAS)
- Group discussion: How is LLM eval different from traditional ML eval?; What are the biggest concerns for LLM performance?; Can we completely remove hallucinations?
- Hands-on exercise: LLM eval
- Q&A
- Break
Deploying and scaling LLMs in production (75 minutes)
- Presentation: Model quantization (ggml, qlora, gptq); locally hosting and running LLMs (LM Studio, GPT4All, Local.AI); deploying models on AWS using SageMaker and Inferentia; model parallelization; continuous monitoring, learning, and testing in production; evaluating LLMs in production; the human side of LLM interactions; evaluating customer interaction workflows; retraining models in production
- Hands-on exercise: Deploy LLMs
- Group discussion: Challenges for bringing (your dream) LLM app into production
- Q&A
Your Instructor
Skanda Vivek
Skanda Vivek is a Senior Data Scientist at Intuit, working on Generative AI. Prior to that he was a senior data scientist at the Risk Intelligence team at OnSolve where he developed advanced Artificial Intelligence based algorithms for rapidly detecting critical emergencies through big data. Before that, he was an assistant professor, and a post-doctoral fellow at Georgia Tech. He received his PhD. in physics from Emory University. His work has been published in multiple scientific journals as well as broadcasted widely by outlets such as BBC and Forbes. He is passionate about sharing knowledge and his blog on applying state of the art AI including LLMs in real-world scenarios has 30k+ monthly views.