Apache Pulsar in Action

Book description

Deliver lightning fast and reliable messaging for your distributed applications with the flexible and resilient Apache Pulsar platform.

In Apache Pulsar in Action you will learn how to:

  • Publish from Apache Pulsar into third-party data repositories and platforms
  • Design and develop Apache Pulsar functions
  • Perform interactive SQL queries against data stored in Apache Pulsar

Apache Pulsar in Action is a comprehensive and practical guide to building high-traffic applications with Pulsar. You’ll learn to use this mature and battle-tested platform to deliver extreme levels of speed and durability to your messaging. Apache Pulsar committer David Kjerrumgaard teaches you to apply Pulsar’s seamless scalability through hands-on case studies, including IOT analytics applications and a microservices app based on Pulsar functions.

About the Technology
Reliable server-to-server messaging is the heart of a distributed application. Apache Pulsar is a flexible real-time messaging platform built to run on Kubernetes and deliver the scalability and resilience required for cloud-based systems. Pulsar supports both streaming and message queuing, and unlike other solutions, it can communicate over multiple protocols including MQTT, AMQP, and Kafka’s binary protocol.

About the Book
Apache Pulsar in Action teaches you to build scalable streaming messaging systems using Pulsar. You’ll start with a rapid introduction to enterprise messaging and discover the unique benefits of Pulsar. Following crystal-clear explanations and engaging examples, you’ll use the Pulsar Functions framework to develop a microservices-based application. Real-world case studies illustrate how to implement the most important messaging design patterns.

What's Inside
  • Publish from Pulsar into third-party data repositories and platforms
  • Design and develop Apache Pulsar functions
  • Create an event-driven food delivery application


About the Reader
Written for experienced Java developers. No prior knowledge of Pulsar required.

About the Author
David Kjerrumgaard is a committer on the Apache Pulsar project. He currently serves as a Developer Advocate for StreamNative, where he develops Pulsar best practices and solutions.

Quotes
Apache Pulsar in Action is able to seamlessly mix the theory and abstract concepts with the clarity of practical step-by-step examples. I’d recommend to anyone!
- Matteo Merli, co-creator of Apache Pulsar

Gives readers insights into how the ‘magic’ works… Definitely recommended.
- Henry Saputra, Splunk

A complete, practical, fun-filled book.
- Satej Kumar Sahu, Honeywell

A definitive guide that will help you scale your applications.
- Alessandro Campeis, Vimar

The best book to start working with Pulsar.
- Emanuele Piccinelli, Empirix

Table of contents

  1. inside front cover
  2. Apache Pulsar in Action
  3. Copyright
  4. dedication
  5. contents
  6. front matter
    1. foreword
    2. preface
    3. acknowledgments
    4. about this book
    5. Who should read this book
    6. How this book is organized: A roadmap
    7. About the code
    8. Other online resources
    9. liveBook discussion forum
    10. about the author
    11. about the cover illustration
  7. Part 1 Getting started with Apache Pulsar
  8. 1 Introduction to Apache Pulsar
    1. 1.1 Enterprise messaging systems
      1. 1.1.1 Key capabilities
    2. 1.2 Message consumption patterns
      1. 1.2.1 Publish-subscribe messaging
      2. 1.2.2 Message queuing
    3. 1.3 The evolution of messaging systems
      1. 1.3.1 Generic messaging systems
      2. 1.3.2 Message-oriented middleware
      3. 1.3.3 Enterprise service bus
      4. 1.3.4 Distributed messaging systems
    4. 1.4 Comparison to Apache Kafka
      1. 1.4.1 Multilayered architecture
      2. 1.4.2 Message consumption
      3. 1.4.3 Data durability
      4. 1.4.4 Message acknowledgment
      5. 1.4.5 Message retention
    5. 1.5 Why do I need Pulsar?
      1. 1.5.1 Guaranteed message delivery
      2. 1.5.2 Infinite scalability
      3. 1.5.3 Resilient to failure
      4. 1.5.4 Support for millions of topics
      5. 1.5.5 Geo-replication and active failover
    6. 1.6 Real-world use cases
      1. 1.6.1 Unified messaging systems
      2. 1.6.2 Microservices platforms
      3. 1.6.3 Connected cars
      4. 1.6.4 Fraud detection
    7. Additional resources
    8. Summary
  9. 2 Pulsar concepts and architecture
    1. 2.1 Pulsar’s physical architecture
      1. 2.1.1 Pulsar’s layered architecture
      2. 2.1.2 Stateless serving layer
      3. 2.1.3 Stream storage layer
      4. 2.1.4 Metadata storage
    2. 2.2 Pulsar’s logical architecture
      1. 2.2.1 Tenants, namespaces, and topics
      2. 2.2.2 Addressing topics in Pulsar
      3. 2.2.3 Producers, consumers, and subscriptions
      4. 2.2.4 Subscription types
    3. 2.3 Message retention and expiration
      1. 2.3.1 Data retention
      2. 2.3.2 Backlog quotas
      3. 2.3.3 Message expiration
      4. 2.3.4 Message backlog vs. message expiration
    4. 2.4 Tiered storage
    5. Summary
  10. 3 Interacting with Pulsar
    1. 3.1 Getting started with Pulsar
    2. 3.2 Administering Pulsar
      1. 3.2.1 Creating a tenant, namespace, and topic
      2. 3.2.2 Java Admin API
    3. 3.3 Pulsar clients
      1. 3.3.1 The Pulsar Java client
      2. 3.3.2 The Pulsar Python client
      3. 3.3.3 The Pulsar Go client
    4. 3.4 Advanced administration
      1. 3.4.1 Persistent topic metrics
      2. 3.4.2 Message inspection
    5. Summary
  11. Part 2 Apache Pulsar development essentials
  12. 4 Pulsar functions
    1. 4.1 Stream processing
      1. 4.1.1 Traditional batching
      2. 4.1.2 Micro-batching
      3. 4.1.3 Stream native processing
    2. 4.2 What is Pulsar Functions?
      1. 4.2.1. Programming model
    3. 4.3 Developing Pulsar functions
      1. 4.3.1 Language native functions
      2. 4.3.2 The Pulsar SDK
      3. 4.3.3 Stateful functions
    4. 4.4 Testing Pulsar functions
      1. 4.4.1 Unit testing
      2. 4.4.2 Integration testing
    5. 4.5 Deploying Pulsar functions
      1. 4.5.1 Generating a deployment artifact
      2. 4.5.2 Function configuration
      3. 4.5.3 Function deployment
      4. 4.5.4 The function deployment life cycle
      5. 4.5.5 Deployment modes
      6. 4.5.6 Pulsar function data flow
    6. Summary
  13. 5 Pulsar IO connectors
    1. 5.1 What are Pulsar IO connectors?
      1. 5.1.1 Sink connectors
      2. 5.1.2 Source connectors
      3. 5.1.3 PushSource connectors
    2. 5.2 Developing Pulsar IO connectors
      1. 5.2.1 Developing a sink connector
      2. 5.2.2 Developing a PushSource connector
    3. 5.3 Testing Pulsar IO connectors
      1. 5.3.1 Unit testing
      2. 5.3.2 Integration testing
      3. 5.3.3 Packaging Pulsar IO connectors
    4. 5.4 Deploying Pulsar IO connectors
      1. 5.4.1 Creating and deleting connectors
      2. 5.4.2 Debugging deployed connectors
    5. 5.5 Pulsar’s built-in connectors
      1. 5.5.1 Launching the MongoDB cluster
      2. 5.5.2 Link the Pulsar and MongoDB containers
      3. 5.5.3 Configure and create the MongoDB sink
    6. 5.6 Administering Pulsar IO connectors
      1. 5.6.1 Listing connectors
      2. 5.6.2 Monitoring connectors
    7. Summary
  14. 6 Pulsar security
    1. 6.1 Transport encryption
    2. 6.2 Authentication
      1. 6.2.1 TLS authentication
      2. 6.2.2 JSON Web Token authentication
    3. 6.3 Authorization
      1. 6.3.1 Roles
      2. 6.3.2 An example scenario
    4. 6.4 Message encryption
    5. Summary
  15. 7 Schema registry
    1. 7.1 Microservice communication
      1. 7.1.1 Microservice APIs
      2. 7.1.2 The need for a schema registry
    2. 7.2 The Pulsar schema registry
      1. 7.2.1 Architecture
      2. 7.2.2 Schema versioning
      3. 7.2.3 Schema compatibility
      4. 7.2.4 Schema compatibility check strategies
    3. 7.3 Using the schema registry
      1. 7.3.1 Modelling the food order event in Avro
      2. 7.3.2 Producing food order events
      3. 7.3.3 Consuming the food order events
      4. 7.3.4 Complete example
    4. 7.4 Evolving the schema
    5. Summary
  16. Part 3 Hands-on application development with Apache Pulsar
  17. 8 Pulsar Functions patterns
    1. 8.1 Data pipelines
      1. 8.1.1 Procedural programming
      2. 8.1.2 DataFlow programming
    2. 8.2 Message routing patterns
      1. 8.2.1 Splitter pattern
      2. 8.2.2 Dynamic router pattern
      3. 8.2.3 Content-based router pattern
    3. 8.3 Message transformation patterns
      1. 8.3.1 Message translator pattern
      2. 8.3.2 Content enricher pattern
      3. 8.3.3 Content filter pattern
    4. Summary
  18. 9 Resiliency patterns
    1. 9.1 Pulsar Functions resiliency
      1. 9.1.1 Adverse events
      2. 9.1.2 Fault detection
    2. 9.2 Resiliency design patterns
      1. 9.2.1 Retry pattern
      2. 9.2.2 Circuit breaker pattern
      3. 9.2.3 Rate limiter pattern
      4. 9.2.4 Time limiter pattern
      5. 9.2.5 Cache pattern
      6. 9.2.6 Fallback pattern
      7. 9.2.7 Credential refresh pattern
    3. 9.3 Multiple layers of resiliency
    4. Summary
  19. 10 Data access
    1. 10.1 Data sources
    2. 10.2 Data access use cases
      1. 10.2.1 Device validation
      2. 10.2.2 Driver location data
    3. Summary
  20. 11 Machine learning in Pulsar
    1. 11.1 Deploying ML models
      1. 11.1.1 Batch processing
      2. 11.1.2 Near real-time
    2. 11.2 Near real-time model deployment
    3. 11.3 Feature vectors
      1. 11.3.1 Feature stores
      2. 11.3.2 Feature calculation
    4. 11.4 Delivery time estimation
      1. 11.4.1 ML model export
      2. 11.4.2 Feature vector mapping
      3. 11.4.3 Model deployment
    5. 11.5 Neural nets
      1. 11.5.1 Neural net training
      2. 11.5.2 Neural net deployment in Java
    6. Summary
  21. 12 Edge analytics
    1. 12.1 IIoT architecture
      1. 12.1.1 The perception and reaction layer
      2. 12.1.2 The transportation layer
      3. 12.1.3 The data processing layer
    2. 12.2 A Pulsar-based processing layer
    3. 12.3 Edge analytics
      1. 12.3.1 Telemetric data
      2. 12.3.2 Univariate and multivariate
    4. 12.4 Univariate analysis
      1. 12.4.1 Noise reduction
      2. 12.4.2 Statistical analysis
      3. 12.4.3 Approximation
    5. 12.5 Multivariate analysis
      1. 12.5.1 Creating a bidirectional messaging mesh
      2. 12.5.2 Multivariate dataset construction
    6. 12.6 Beyond the book
    7. Summary
  22. Appendix A. Running Pulsar on Kubernetes
    1. A.1 Create a Kubernetes cluster
      1. A.1.1 Install prerequisites
      2. A.1.2 Minikube
    2. A.2 The Pulsar Helm chart
      1. A.2.1 What is Helm?
      2. A.2.2 The Pulsar Helm chart
    3. A.3 Using the Pulsar Helm chart
      1. A.3.1 Administering Pulsar on Kubernetes
      2. A.3.2 Configuring clients
  23. Appendix B. Geo-replication
    1. B.1 Synchronous geo-replication
    2. B.2 Asynchronous geo-replication
      1. B.2.1 Configuring asynchronous geo-replication
    3. B.3 Asynchronous geo-replication patterns
      1. B.3.1 Multi-active geo-replication
      2. B.3.2 Active-standby geo-replication
      3. B.3.3 Aggregation geo-replication
  24. index
  25. inside back cover

Product information

  • Title: Apache Pulsar in Action
  • Author(s): David Kjerrumgaard
  • Release date: December 2021
  • Publisher(s): Manning Publications
  • ISBN: 9781617296888