Open Weight Large Language Models Bootcamp

Intermediate

Learn how to answer questions, use SBERT, LLaMa, DeepSeek and others, and tailor them to your needs

Course outcomes:

Understand LLMs with open weights and how to modify them with open source software
Be able to perform question answering with existing document corpora as a basis for RAG
Learn about specific advantages and disadvantages of existing LLMs
Be able to deploy LLMs in environments with limited resources using existing frontends

Course description:

There is an incredible amount of different large language models available on Hugging Face. Many models are generic, some are suited for very special requirements. A lot of work has been invested by the Open Source community to fine-tune models to specific needs like question answering or working as a chatbot. Additionally, the underlying training data has been analyzed in detail, and the community is collecting free training data. This leads to new and more powerful, commercially usable LLMs with open weights with more and more coming.

Join expert Christian Winkler to get a structured and consistent introduction to using LLMs with open weights. You’ll learn how to use models to retrieve information, combine the results of different models and refine the results with dense passage retrieval. You’ll get working, hands-on solutions, and explanations on how these models can also excel on less powerful hardware by using new approaches to quantization. And you’ll also learn about different frontends these models can be plugged into. All code will be provided in the course and in GitHub.

NOTE: With today’s registration, you’ll be signed up for both days. Although you can attend either of the sessions individually, we recommend participating in both.

What you’ll learn and how you can apply it

Work with open weight LLMs
Choose the correct base LLM
Retrieve text (e.g. as a basis for RAG)
Work with multiple models in parallel to dramatically improve results
Refine the results with cross-encoder for even better performance
Generate text with generative LLMs
Deploy in limited environment using quantization and optimized software

This live event is for you because...

You’re a data scientist, ML engineer, or NLP developer.
You want to become an expert in large language models.
You want to use modern methods for business use cases.

Prerequisites

Set up a Google Colab account
For local installations, a powerful GPU is necessary)
Link to Jupyter Notebook
A working knowledge of Python and Jupyter notebooks
Machine learning and Hugging Face Transformers experience (helpful but not required)

Recommended follow-up:

Take Fundamentals of Large Language Models (live online course with Jonathan Fernandes)
Read Natural Language Processing with Transformers (book)
Read Generative Deep Learning (book)

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Day 1

Sentence embeddings (90 minutes)

Presentation: Introduction to similarity; finding semantically similar statements
Hands-on exercises: Prepare data; work with an embeddings model using SBERT; use different base models and compare results
Q&A
Break

Advanced document retrieval (90 minutes)

Presentation: Vector databases; sparse document vectors (lexical retrieval); rank fusion algorithms and cross-encoders
Hands-on exercises: Use an (embedded) vector database; implement a rank fusion algorithm; use cross-encoders; compare with previous results; improve solution with a vector database
Use the reciprocal rank fusion algorithm to work with several models in parallel (boosting the performance)
Refine the results with cross-encoders
Q&A
Break

Using existing software (60 minutes)

Presentation: Open source solutions for document retrieval; features of LangChain, txtai and LlamaIndex
Hands-on exercises: Install software; use existing software like txtai or LangChain to achieve a similar result; explore deployment options
Q&A

Day 2

Generative LLMs (90 minutes)

Presentation: Introduction to transformers and open source language models and their differences ( Qwen, Gemma, GPT-OSS, DeepSeek); using existing models and handling them with specific libraries or generic transformers
Hands-on exercises: Download and use existing model; differences between popular data formats; use model-specific libraries for answering questions
Q&A
Break

Quantization, execution, and deployment (90 minutes)

Presentation: Introduction to resource limits; solution with quantization; different ways of quantizing; GGUF, AWQ and other quantization strategies
Hands-on exercises: Quantize an existing model; compare original results to quantized results; deploy using vLLM or SGLang
Q&A
Break

Frontend solutions (60 minutes)

Presentation: Introduction; focus on UX; different ready-made solutions
Hands-on exercise: Work with different frontend solutions ( Open WebUI, llama.cpp, and LM Studio); compare their features
Q&A

Your Instructor

Christian Winkler
Christian Winkler is a professor at the Technical University of Applied Science in Nürnberg, where he concentrates on the latest research in natural language processing and, specifically, in the application of large language models. He coauthored Blueprints for Text Analytics Using Python for O’Reilly and has written many articles about NLP.

linkedin link search

Skills covered

Large Language Models (LLMs)

MLOps
Design Patterns
Hugging Face

Cloud Computing

Data Engineering

Data Science

AI & ML

Programming Languages

Software Architecture

IT/Ops

Security

Design

Business

Soft Skills