O'Reilly logo
live online training icon Live Online training

Real-time data foundations: Kafka

The what, why, and how of Kafka

Ted Malaska

Kafka and streams are key parts of any modern architecture. Kafka is the medium through which your data will pass, and in the new world of seeing streams as tables, Kafka can be used as your data store as well.

In this 90-minute introductory training, Ted Malaska covers everything you need to know to get started with Kafka the right way. You'll learn about Kafka's architecture, use cases, and configurations and gain hands-on experience with working examples that will get you up and running in under 10 minutes.

Special note: This is the first course in a four-part series focused on building a foundation in near-real-time processing of IoT data. Although these courses are designed to be taken in any order, we suggest you take this course first for best results.

  1. Real Time Data Foundations: Kafka
  2. Real Time Data Foundations: Spark
  3. Real Time Data Foundations: Flink
  4. Real Time Data Foundations: Time Series Architectures

What you'll learn-and how you can apply it

By the end of this live, online course, you’ll understand:

  • Components of Kafka's architecture
  • Kafka use cases
  • Kafka configurations

And you’ll be able to:

  • Start up a Kafka instance in minutes and experiment with it

This training course is for you because...

  • You're a data engineer who wants to get started using Kafka for the design and development of streaming use cases.
  • You work in data management and want to understand how to apply your skills to near-real-time use cases and date delivery.


  • A basic understanding of working with data

Materials or downloads needed in advance:

  • A machine with Docker installed
  • A GitHub account

Recommended preparation:

About your instructor

  • Ted Malaska is the director of engineering for data streaming and persistence at Capital One. Previously, he was on the Battle.net team at Blizzard Entertainment, he was also a principal solutions architect at Cloudera, where he helped clients succeed with Hadoop and the Hadoop ecosystem, and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is the coauthor of Hadoop Application Architectures, a frequent conference speaker, and a blogger on data architectures.


The timeframes are only estimates and may vary according to how the class is progressing

  • Kafka architecture (component rundown) (25 minutes)
  • Use cases for Kafka (15 minutes)
  • Common Kafka configurations (10 minutes)
  • Break (10 minutes)
  • Live example: Setting up Kafka and using it (15 minutes)
  • Kafka clients APIs and their impact (15 minutes)