O'Reilly logo
live online training icon Live Online training

Implementing an Edge Computing Apache Kafka Inference Engine: Effective Data Pipelines

Topic: Data
Douglas Eadline

Edge computing is data processing that is performed locally outside of a data center or Cloud. In many cases it is usually near where data are generated. After processing, smaller amounts of data may then be moved to the cloud or data center for further processing.

Inference is the application of machine learning models to data. To be clear, inference is different from training (learning) where models are developed using numeric algorithms or deep learning methods. In almost all cases, inference requires less resources than learning and can be implemented at the Edge without the need for hardware acceleration (e.g. GPUs).

This class develops an Edge application using Apache Kafka for data aggregation and TensorFlow Lite for inference. Specifically, an application is developed that visually monitors a manufacturing process (in this case a 3D Printer) and infers when the process is not working correctly. The scalable nature of Kafka allows additional processes (e.g. 3D printers) to be easily added to the workflow pipeline.

What you'll learn-and how you can apply it

  • Learn the concepts of Edge Computing and why it different than the Cloud or data center
  • Learn how Kafka works on a small Edge cluster
  • Understand how to configure and use Kafka as a data broker
  • Understand the difference between learning and inference
  • Learn how to use TensorFlow Lite with a previously developed model
  • Learn how Kafka manages data streaming to and from TensorFlow Lite

This training course is for you because...

  • You want to understand a specific Edge computing example
  • You want to learn the basics of Kafka and TensorFlow Lite
  • Hands-on experience is important to you when learning a new technology
  • You prefer a copy of the Instructors hands-on notes and class slides
  • The ability to continue exploring and implementing using after the class is important

Prerequisites

  • Beginning/Intermediate Linux Command Line for Data Engineers (Live Online Training by Douglas Eadline; search the O'Reilly Learning Platform for an upcoming date)
  • Please be aware, if you have no experience with the Linux command line, you may find this course difficult to follow.

Course Setup:

  • To run some of the class examples, a Linux Hadoop Minimal Virtual Machine (VM) is available. The VM is a full Linux installation that can run on your laptop/desktop using VirtualBox (freely available). The VM provides a functional Kafka, TensorFlow-Lite environment (in addition to Hadoop, Hive, and Spark) environment to continue learning after the class
  • Further information on the class, access to the class notes, and the Linux Hadoop Minimal VM can be found at https://www.clustermonkey.net/scalable-analytics.

About your instructor

  • Douglas Eadline, PhD, began his career as an analytical chemist with an interest in computer methods. Starting with the first Beowulf how-to document, Doug has written instructional documents covering many aspects of Linux HPC (High Performance Computing) and Hadoop computing. Currently, Doug serves as editor of the ClusterMonkey.net website and was previously editor of ClusterWorld Magazine, and senior HPC Editor for Linux Magazine. He is also an active writer and consultant to the HPC/Analytics industry. His recent video tutorials and books include of the Hadoop and Spark Fundamentals LiveLessons (Addison Wesley) video, Hadoop 2 Quick Start Guide (Addison Wesley), High Performance Computing for Dummies (Wiley) and Practical Data Science with Hadoop and Spark (Co-author, Addison Wesley).

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Segment 1: Introduction and Course Goals (20 mins)

  • Class Resources and web page
  • How to get the most out of this course
  • Required prerequisite skills
  • Using the Linux Hadoop Minimal virtual machine

Segment 2: Problem Description (25)

  • Manufacturing Process: 3D printing
  • Local compute: Limulus Edge cluster
  • Model Training (learning what "should" happen)
  • How to watch a process with inference

Segment 3: Using Apache Kafka (45 mins)

  • Kafka Background and use cases
  • Sending messages with producers
  • Reading messages with consumers

Break (10 mins)

Segment 4: Using TensorFlow Lite (25 mins)

  • Learning vs Inference
  • Using Python with TensorFlow Lite
  • A simple example of inference with TensorFlow Lite

Segment 5: Integrating Components (30 mins)

  • Configuring Image Streaming to Kafka
  • The Kafka to TensorFlow Lite connection

Segment 6: Testing the Application (25 mins)

  • Using real (pre-recorded data) from the 3D printer
  • Possible improvements

Segment 6: Course Wrap-up, Questions, and Additional Resources (10 mins)