Skip to Content
View all events

Working Locally with Open Weight LLMs

Published by O'Reilly Media, Inc.

Intermediate content levelIntermediate

Using Gemma, Qwen, DeepSeek and other generative LLMs on your own hardware and tailor them to your needs

Course outcomes:

  • Understand generative LLMs with open weights and how to use them with open source software
  • Learn about specific advantages and disadvantages of existing LLMs
  • Understand quantization like TurboQuant and its advantages
  • Be able to deploy LLMs in environments with limited resources
  • Compare the different frontend solutions and UX for interacting with LLMs

Course description:

The amount of openly available generative large language models is growing extensively. Many models are generic, some are suited for very special requirements like reasoning. This leads to new and more powerful, commercially usable LLMs with open weights with more and more coming. Join expert Christian Winkler to get a structured and consistent introduction to using LLMs with open weights. You’ll learn how to use models to retrieve information, combine the results of different models and refine the results with dense passage retrieval. You’ll get working, hands-on solutions, and explanations on how these models can also excel on less powerful hardware by using new approaches to quantization. And you’ll also learn about different frontends these models can be plugged into. All code will be provided in the course and in GitHub.

What you’ll learn and how you can apply it

  • Work with open weight LLMs using the transformers library
  • Choose the correct base LLM (dense, liquid, mixture of experts etc.)
  • Generate text with generative LLMs
  • Use quantization to run LLMs with lower memory requirements
  • Speed up text generation by many factors (important for agents)
  • Use specialized software like vLLM, SGlang or Tabby API for a one-stop replacement of Open AI
  • Use large quantized models on the CPU on PCs and Macs and understand the differences

This live event is for you because...

  • You’re a data scientist, ML engineer, or NLP developer.
  • You want to become an expert in large language models.
  • You want to use modern methods for business use cases.

Prerequisites

  • A Google Colab account
  • alternatively you can run the software on your own computer, preferably with a GPU, runpod or other hosters are also good options
  • Link to Jupyter Notebook
  • Link to GitHub repository
  • A working knowledge of Python and Jupyter notebooks
  • Machine learning and Hugging Face Transformers experience (helpful but not required)

Recommended follow-up:

Schedule

The time frames are only estimates and may vary according to how the class is progressing.

Generative LLMs (90 minutes)

  • Presentation: Introduction to transformers and open weight language models and their differences ( Qwen, Gemma, GPT-OSS, DeepSeek); using existing models and handling them with specific libraries or generic transformers
  • Hands-on exercises: Download and use existing model; differences between popular data formats; use model-specific libraries for answering questions
  • Q&A
  • Break

Quantization, execution, and deployment (90 minutes)

  • Presentation: Introduction to resource limits; solution with quantization; different ways of quantizing; GGUF, AWQ and other quantization strategies
  • Hands-on exercises: Quantize an existing model; compare original results to quantized results; deploy using vLLM or SGLang
  • Q&A
  • Break

Frontend solutions (60 minutes)

  • Presentation: Introduction; focus on UX; different ready-made solutions
  • Hands-on exercise: Work with different frontend solutions (Open WebUI, llama.cpp, and LM Studio); compare their features
  • Q&A

Your Instructor

  • Christian Winkler

    Christian Winkler is a professor at the Technical University of Applied Science in Nürnberg, where he concentrates on the latest research in natural language processing and, specifically, in the application of large language models. He coauthored Blueprints for Text Analytics Using Python for O’Reilly and has written many articles about NLP.

Skills covered

  • Large Language Models (LLMs)
  • TensorFlow
  • Scikit-learn