Skip to Content
View all events

Open Weight Large Language Models Bootcamp

Published by O'Reilly Media, Inc.

Intermediate content levelIntermediate

Learn how to answer questions, use SBERT, LLaMa, DeepSeek and others, and tailor them to your needs

Course outcomes:

  • Understand LLMs with open weights and how to modify them with open source software
  • Be able to perform question answering with existing document corpora as a basis for RAG
  • Learn about specific advantages and disadvantages of existing LLMs
  • Be able to deploy LLMs in environments with limited resources using existing frontends

Course description:

There is an incredible amount of different large language models available on Hugging Face. Many models are generic, some are suited for very special requirements. A lot of work has been invested by the Open Source community to fine-tune models to specific needs like question answering or working as a chatbot. Additionally, the underlying training data has been analyzed in detail, and the community is collecting free training data. This leads to new and more powerful, commercially usable LLMs with open weights with more and more coming.

Join expert Christian Winkler to get a structured and consistent introduction to using LLMs with open weights. You’ll learn how to use models to retrieve information, combine the results of different models and refine the results with dense passage retrieval. You’ll get working, hands-on solutions, and explanations on how these models can also excel on less powerful hardware by using new approaches to quantization. And you’ll also learn about different frontends these models can be plugged into. All code will be provided in the course and in GitHub.

NOTE: With today’s registration, you’ll be signed up for both days. Although you can attend either of the sessions individually, we recommend participating in both.

What you’ll learn and how you can apply it

  • Work with open weight LLMs
  • Choose the correct base LLM
  • Retrieve text (e.g. as a basis for RAG)
  • Work with multiple models in parallel to dramatically improve results
  • Refine the results with cross-encoder for even better performance
  • Generate text with generative LLMs
  • Deploy in limited environment using quantization and optimized software

This live event is for you because...

  • You’re a data scientist, ML engineer, or NLP developer.
  • You want to become an expert in large language models.
  • You want to use modern methods for business use cases.

Prerequisites

  • Set up a Google Colab account
  • For local installations, a powerful GPU is necessary)
  • Link to Jupyter Notebook
  • A working knowledge of Python and Jupyter notebooks
  • Machine learning and Hugging Face Transformers experience (helpful but not required)

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Day 1

Sentence embeddings (90 minutes)

  • Presentation: Introduction to similarity; finding semantically similar statements
  • Hands-on exercises: Prepare data; work with an embeddings model using SBERT; use different base models and compare results
  • Q&A
  • Break

Advanced document retrieval (90 minutes)

  • Presentation: Vector databases; sparse document vectors (lexical retrieval); rank fusion algorithms and cross-encoders
  • Hands-on exercises: Use an (embedded) vector database; implement a rank fusion algorithm; use cross-encoders; compare with previous results; improve solution with a vector database
  • Use the reciprocal rank fusion algorithm to work with several models in parallel (boosting the performance)
  • Refine the results with cross-encoders
  • Q&A
  • Break

Using existing software (60 minutes)

  • Presentation: Open source solutions for document retrieval; features of LangChain, txtai and LlamaIndex
  • Hands-on exercises: Install software; use existing software like txtai or LangChain to achieve a similar result; explore deployment options
  • Q&A

Day 2

Generative LLMs (90 minutes)

  • Presentation: Introduction to transformers and open source language models and their differences ( Qwen, Gemma, GPT-OSS, DeepSeek); using existing models and handling them with specific libraries or generic transformers
  • Hands-on exercises: Download and use existing model; differences between popular data formats; use model-specific libraries for answering questions
  • Q&A
  • Break

Quantization, execution, and deployment (90 minutes)

  • Presentation: Introduction to resource limits; solution with quantization; different ways of quantizing; GGUF, AWQ and other quantization strategies
  • Hands-on exercises: Quantize an existing model; compare original results to quantized results; deploy using vLLM or SGLang
  • Q&A
  • Break

Frontend solutions (60 minutes)

  • Presentation: Introduction; focus on UX; different ready-made solutions
  • Hands-on exercise: Work with different frontend solutions ( Open WebUI, llama.cpp, and LM Studio); compare their features
  • Q&A

Your Instructor

  • Christian Winkler

    Christian Winkler is a professor at the Technical University of Applied Science in Nürnberg, where he concentrates on the latest research in natural language processing and, specifically, in the application of large language models. He coauthored Blueprints for Text Analytics Using Python for O’Reilly and has written many articles about NLP.

Skills covered

  • Large Language Models (LLMs)
  • MLOps
  • Design Patterns
  • Hugging Face