O'Reilly logo
live online training icon Live Online training

Building Data Pipelines with Cassandra, Kafka, and Spark

enter image description here

Moving, storing, and analyzing data in microservice applications

Topic: Data
Jeff Carpenter

Cassandra, Kafka, and Spark all represent ecosystems with many capabilities and integrations, so it can be confusing to understand when it’s best to use each—and for what purpose (e.g., Spark Streaming versus Kafka Streams or Kafka’s KSQL versus storing data in Cassandra). The best way forward is to learn how to use (and combine) each of these technologies effectively.

Join expert Jeff Carpenter to learn how to quickly assemble data pipelines using Cassandra, Kafka, and Spark. You’ll discover how to combine Kafka and Cassandra in microservice applications and use the Kafka Connect framework and the DataStax Kafka sink connectors. You’ll also explore the principles of Cassandra data modeling, learn how to design Cassandra tables and push data from Kafka topics into those tables, and dive into analyzing data in Cassandra using Spark.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • Design patterns for microservice-based applications using Kafka and Cassandra
  • How to effectively combine Kafka, Cassandra, and Spark to move, store, and analyze data

And you’ll be able to:

  • Build a microservice that stores data in Cassandra and publishes messages using Kafka
  • Configure the Kafka Connect framework and the DataStax Kafka Connector to push data from Kafka topics into Cassandra
  • Analyze data stored in Cassandra using DataFrames and SQL in Spark

This training course is for you because...

  • You’re a developer creating microservice applications who wants to understand how to manage the flow of data between services and systems.
  • You’re a data analyst, data engineer, data scientist, or machine learning engineer consuming data from microservice applications, and you want to understand how to obtain data you need.


  • Basic programming experience with Java
  • Familiarity with SQL (useful but not required)

Recommended follow-up:

About your instructor

  • Jeff Carpenter is a developer advocate at DataStax, where he leverages his background in system architecture, microservices and Apache Cassandra to help empower developers and operations engineers build distributed systems that are scalable, reliable, and secure. Jeff has worked on projects ranging from a complex battle planning system in an austere network environment, to a cloud-based hotel reservation system. He the author of Cassandra: The Definitive Guide, 2nd Edition.


The timeframes are only estimates and may vary according to how the class is progressing

Introducing Cassandra (25 minutes)

  • Presentation: The basics of inserting and querying data in CQL
  • Katacoda interactive exercise: Get to know CQL
  • Q&A

Cassandra data modeling (30 minutes)

  • Presentation: Data modeling principles in Cassandra, including denormalization and how the Cassandra primary key works
  • Katacoda interactive exercise: Design Cassandra tables
  • Q&A

Break (5 minutes)

Microservices with Kafka and Cassandra (30 minutes)

  • Presentation: Patterns for combining Kafka and Cassandra in microservice applications
  • Katacoda interactive exercise: Build a microservice with Apache Cassandra and Apache Kafka
  • Q&A

Kafka Connect (25 minutes)

  • Presentation: The Kafka Connect framework; the availability of various connectors, including the DataStax Kafka Connector (sink connector)
  • Katacoda interactive exercise: Push data to Apache Cassandra with the DataStax Apache Kafka Connector
  • Q&A

Break (5 minutes)

Querying Cassandra data (30 minutes)

  • Presentation: Various options for querying data in Cassandra, including limitations of CQL
  • Katacoda interactive exercise: Use data models
  • Q&A

Analyzing data using Spark (30 minutes)

  • Presentation: Analyzing data stored in Cassandra using Spark, including the DataFrame and SQL APIs
  • Katacoda interactive exercise: Analyze data using the DataStax Spark-Cassandra Connector
  • Q&A