Skip to content
  • Sign In
  • Try Now
View all events
Large Language Models (LLMs)

Large Language Models in Production

Published by O'Reilly Media, Inc.

Intermediate content levelIntermediate

How to navigate the complexities of deploying and optimizing LLMs in production

Course outcomes

  • Learn how to make critical LLM framework decisions
  • Understand how to evaluate LLMs
  • Learn various options for deploying and monitoring LLMs in production

Course description

Large language models are fundamentally changing the way practitioners integrate AI into applications. LLM adoption looks deceptively simple due to the relatively low barrier for using state-of-the-art models like GPT-4, Llama 2, etc. However, making these systems production ready is still a challenge and requires a grasp of new concepts like prompt engineering, retrieval-augmented generation, hallucinations, and more.

Join expert Skanda Vivek to learn the fundamental concepts for building and deploying real-world LLMs in production.

What you’ll learn and how you can apply it

  • Make critical LLM framework decisions
  • Evaluate LLMs
  • Deploy and monitor LLMs in production

This live event is for you because...

  • You’re a developer who’s integrating LLMs into a product.
  • You’re a CTO/CDO who wants to integrate LLMs into your business.
  • You're a software engineer who wants to learn more about LLMs to upskill or apply for ML engineer jobs.

Prerequisites

  • Familiarity with fundamental machine learning concepts (classification and regression, model training and testing, loss functions, backpropagation, etc.)
  • Familiarity with software development in Python
  • Familiarity with ChatGPT

Recommended preparation:

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

LLM design considerations (60 minutes)

  • Presentation: Making key LLM decisions; four key decision metrics (response quality, economics, latency, privacy); LLM deployment economics (ChatGPT/GPT-4 versus open source); levers of choice (prompt engineering, RAG, fine-tuning); embeddings and vector databases; RAG; integrating LLM and vector DB (context window size, choosing context, chat history, building comprehensive architectures for handling complex cases); fine-tuning and PEFT (fine-tuning to reduce memory footprint and performance degradation)
  • Group discussion: When do you choose RAG versus fine-tuning?
  • Hands-on exercise: Explore prompt engineering, RAG, and fine-tuning
  • Q&A
  • Break

Evaluating LLMs (45 minutes)

  • Presentation: Evaluating LLMs; traditional ML eval versus LLM eval; LLM eval datasets; GPT as a judge; evaluating embeddings; eval tools (Langsmith, RAGAS)
  • Group discussion: How is LLM eval different from traditional ML eval?; What are the biggest concerns for LLM performance?; Can we completely remove hallucinations?
  • Hands-on exercise: LLM eval
  • Q&A
  • Break

Deploying and scaling LLMs in production (75 minutes)

  • Presentation: Model quantization (ggml, qlora, gptq); locally hosting and running LLMs (LM Studio, GPT4All, Local.AI); deploying models on AWS using SageMaker and Inferentia; model parallelization; continuous monitoring, learning, and testing in production; evaluating LLMs in production; the human side of LLM interactions; evaluating customer interaction workflows; retraining models in production
  • Hands-on exercise: Deploy LLMs
  • Group discussion: Challenges for bringing (your dream) LLM app into production
  • Q&A

Your Instructor

  • Skanda Vivek

    Skanda Vivek is a Senior Data Scientist at Intuit, working on Generative AI. Prior to that he was a senior data scientist at the Risk Intelligence team at OnSolve where he developed advanced Artificial Intelligence based algorithms for rapidly detecting critical emergencies through big data. Before that, he was an assistant professor, and a post-doctoral fellow at Georgia Tech. He received his PhD. in physics from Emory University. His work has been published in multiple scientific journals as well as broadcasted widely by outlets such as BBC and Forbes. He is passionate about sharing knowledge and his blog on applying state of the art AI including LLMs in real-world scenarios has 30k+ monthly views.

    linkedinXlinksearch