Kafka Essentials LiveLessons: A Quick-Start for Building Effective Data Pipelines
with Douglas Eadline
Overview
8+ Hours of Video Instruction
Learn how to manage high-performance data pipelines, streaming analytics, data integration, and mission-critical applications with Apache Kafka.
Apache Kafka is a popular message broker providing data flow management between producer application sources and consumer destinations. Kafka Essentials: A Quick-Start for Building Effective Data Pipelines covers many essential and practical aspects of using and running the Apache Kafka event streaming platform.
Learn How To
- Write a Kafka Python application to produce and consume data
- Use the Kafkaesque GUI
- Use keys and multiple partitions with Kafka topics
- Develop Python consumers that access data by index or time stamp
- Save Kafka log data to external storage and databases
- Use and configure Kafka Connect services
- Conduct image streaming with Kafka
- Size hardware for a Kafka cluster
- Install Kafka and Zookeeper across multiple servers
- Configure partitions across multiple brokers
- Administer Kafka broker partition allocation, log file management, topic management, monitoring, and benchmarking.
About the Instructor
Doug Eadline is a practitioner and writer in the Linux cluster community and has documented many aspects of high-performance computing (HPC) and Hadoop/Spark computing. Currently, he is the editor of the HPCwire.com website and was previously the editor of ClusterWorld Magazine and a senior HPC Editor for Linux Magazine. Some of his popular video tutorials and books include Data Engineering Foundations LiveLessons Parts 1 and 2, Hadoop 2 Quick Start, High-Performance Computing for Dummies, and Practical Data Science with Hadoop and Spark.
Who Should Take This Course
- You want to understand Apache Kafka and data streaming
- You want to learn the basics of building data pipelines with Kafka sing Python
- Hands-on experience with examples is important to you when learning a new technology
- You want to continue exploring Kafka using a complete copy of the instructor's hands-on notes, example code, and free virtual machine.
Course Requirements
The course assumes familiarity with Python and the BASH command line on a modern Linux server. Python is used for all examples. BASH scripting is used to facilitate some examples and for installation and administration tasks.
Lesson Descriptions
Lesson 1: Kafka Background Concepts
In Lesson 1, Doug introduces Kafka by asking, "Why do I need a message broker?" Once answered, he explains the basic Kafka components, and then introduces the freely available Linux virtual machine that you will use to run many of the examples presented in the lessons. The lesson concludes with some basic examples of Kafka usage.
Lesson 2: Viewing Kafka Operations
Lesson 2 presents a Kafka graphical user interface, Kafkaesque. This interface lets you see inside Kafka topic logs, which will be used in many subsequent lessons. Doug uses Kafkaesque to review the basic examples from Lesson 1.
Lesson 3: Streaming NOAA Weather Data with Kafka Python
Lesson 3 provides a look at a simple Kafka Python application that produces (downloads) data from the NOAA weather site, and then consumes the data by loading it into a Pandas data frame. The examples are expanded to demonstrate the use of keys and multiple partitions with Kafka topics. The lesson concludes by developing Python consumers that access data by index or time stamp.
Lesson 4: Moving Kafka Topic Data to External Storage
Lesson 4 shows you how to save Kafka log data to external storage. Examples include PySpark streaming and Python consumers that write to MariaDB (MySQL) and Apache HBASE.
Lesson 5: Edge Image Streaming with Kafka Python
Lesson 5 demonstrates image streaming with Kafka. The example uses a Kafka Python producer to capture images from a 3D printer that are then examined by a Python consumer that performs real-time CNN analysis looking for defects. A simulated version is provided for the virtual machine.
Lesson 6: Data Pipelines and Kafka Connect
In Lesson 6, the Kafka Connector interface is introduced. Kafka connectors provide a quick method to use pre-written consumers and producers for many popular services. Doug demonstrates Kafka Connectors for text files, HDFS, and MariaDB (MySQL), along with connector management methods.
Lesson 7: Installation Considerations
In Lesson 7, Kafka broker installation is discussed. Topics include hardware choices, a recipe with scripts used for installing Kafka and Zookeeper across multiple servers, and configuring partitions across multiple brokers.
Lesson 8: Basic Administration Topics
Lesson 8 presents basic administration of Kafka brokers. Doug discusses various aspects of partition allocation and log file management. Coverage then moves to Kafka topic management, monitoring, and benchmarking of Kafka clusters.
All code, background, and links for this video can be found at: https://www.clustermonkey.net/download/LiveLessons/Kafka_Essentials/
About Pearson Video Training
Pearson publishes expert-led video tutorials covering a wide selection of technology topics designed to teach you the skills you need to succeed. These professional and personal technology videos feature world-leading author instructors published by your trusted technology brands: Addison-Wesley, Cisco Press, Pearson IT Certification, Sams, and Que. Topics include: IT Certification, Network Security, Cisco Technology, Programming, Web Development, Mobile Development, and more. Learn more about Pearson Video training at http://www.informit.com/video.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Watch now
Unlock full access