O'Reilly logo
live online training icon Live Online training

First Steps: Event Streaming with Apache Pulsar

Publish and subscribe to event streams at scale

Topic: Data
Sijie Guo

Apache Pulsar—a next-generation messaging and streaming data system originally built at Yahoo and now a top-level Apache project—separates messaging serving and data storage into two layers. Such layered architecture provides high throughput and low latency while ensuring high availability and scalability. Pulsar’s segment-centric storage design, along with its layered architecture, makes Pulsar a perfect unbounded streaming data system.

Join expert Sijie Guo for a deep dive into Apache Pulsar. You’ll explore basic concepts before learning how to ingest, store, and process messages using Apache Pulsar. Along the way, you’ll examine Pulsar’s unified pub/sub messaging API and discover how to produce and consume messages using different subscription types.

What you'll learn-and how you can apply it

By the end of this live online course, you’ll understand:

  • Basic Pulsar concepts
  • Pub/sub APIs in Apache Pulsar
  • Basic Pulsar schema concepts
  • How to produce events using the producer API
  • How to consume events using the consumer API
  • The difference between different subscription types
  • How to read events using the read API

And you’ll be able to:

  • Write a Pulsar producer application to produce events
  • Write a Pulsar consumer application to consume events
  • Write a Pulsar reader application to read events

This training course is for you because...

  • You’re a data engineer building a data pipeline using Pulsar.
  • You have multiple event sources that you want to ingest to Pulsar.
  • You’ve collected events in Pulsar and want to query and process them.
  • You want to become a data expert using Pulsar.

Prerequisites

  • Familiarity with pub/sub messaging and basic programming concepts
  • A basic understanding of Pulsar

Recommended preparation:

Recommended follow-up:

About your instructor

  • Sijie Guo (Linkedin / GitHub) is the founder and CEO of StreamNative. StreamNative is a data infrastructure startup offering a cloud native event streaming platform based on Apache Pulsar for enterprises. Previously, he was the tech lead for the Messaging Group at Twitter and worked on push notification infrastructure at Yahoo. He is also the VP of Apache BookKeeper and PMC Member of Apache Pulsar. He has presented at several tech conferences, including Strata, QCon, Flink Forward, and Scale by the Bay.

Schedule

The timeframes are only estimates and may vary according to how the class is progressing

Course overview (10 minutes)

  • Presentation: Introduction and course overview
  • Demo: Using the producer and consumer APIs to produce and consume clickstreams
  • Q&A

Pulsar concepts (30 minutes)

  • Group discussion: How familiar are you with Pulsar?
  • Presentation: Pulsar basics; Pulsar architecture; event streams and segments; two-level reading APIs
  • Q&A

Producing events to Pulsar topics (25 minutes)

  • Katacoda interactive exercise: Produce events to Pulsar topics using the Java client—send versus SendAsync, message, and MessageRouter
  • Q&A

Break (5 minutes)

Consuming events from Pulsar topics (25 minutes)

  • Katacoda interactive exercise: Consume events from Pulsar topics using the Java client—receive, ReceiveAsync, and MessageListener; acknowledge versus - AcknowledgeCumulative
  • Q&A

Consuming events using different subscription modes (25 minutes)

  • Katacoda interactive exercise: Consume events from Pulsar topics using different subscription modes—exclusive, failover, shared, and Key_Shared
  • Q&A

Break (5 minutes)

Managing consumption state (25 minutes)

  • Katacoda interactive exercise: Manage consumption state of subscriptions—seeking by MessageId and time, ack and nack, and using the reader interface
  • Q&A

Advanced consumers (25 minutes)

  • Katacoda interactive exercise: Use advanced features of consumers—dead letter topic, delayed and scheduled messages, and TTL
  • Q&A

Break (5 minutes)

Pulsar schema introduction (15 minutes)

  • Presentation: Overview; how it works
  • Q&A

Producing and consuming events using schema (15 minutes)

  • Katacoda interactive exercise: Produce and consume events using Avro and JSON schema
  • Q&A

Managing schemas (25 minutes)

  • Katacoda interactive exercise: Manage schemas—get, update, and delete schemas; determine the schema compatibility

Wrap-up and Q&A (5 minutes)