O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building Data Streaming Applications with Apache Kafka

Book Description

Design and administer fast, reliable enterprise messaging systems with Apache Kafka

About This Book

  • Build efficient real-time streaming applications in Apache Kafka to process data streams of data
  • Master the core Kafka APIs to set up Apache Kafka clusters and start writing message producers and consumers
  • A comprehensive guide to help you get a solid grasp of the Apache Kafka concepts in Apache Kafka with pracitcalpractical examples

Who This Book Is For

If you want to learn how to use Apache Kafka and the different tools in the Kafka ecosystem in the easiest possible manner, this book is for you. Some programming experience with Java is required to get the most out of this book

What You Will Learn

  • Learn the basics of Apache Kafka from scratch
  • Use the basic building blocks of a streaming application
  • Design effective streaming applications with Kafka using Spark, Storm &, and Heron
  • Understand the importance of a low -latency , high- throughput, and fault-tolerant messaging system
  • Make effective capacity planning while deploying your Kafka Application
  • Understand and implement the best security practices

In Detail

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur.

This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.

By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.

Style and approach

A step-by –step, comprehensive guide filled with practical and real- world examples

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the code file.

Table of Contents

  1. Preface
    1. What this book covers
    2. What you need for this book
    3. Who this book is for
    4. Conventions
    5. Reader feedback
    6. Customer support
      1. Downloading the example code
      2. Downloading the color images of this book
      3. Errata
      4. Piracy
      5. Questions
  2. Introduction to Messaging Systems
    1. Understanding the principles of messaging systems
    2. Understanding messaging systems
    3. Peeking into a point-to-point messaging system
    4. Publish-subscribe messaging system
    5. Advance Queuing Messaging Protocol
    6. Using messaging systems in big data streaming applications
    7. Summary
  3. Introducing Kafka the Distributed Messaging Platform
    1. Kafka origins
    2. Kafka's architecture
    3. Message topics
    4. Message partitions
    5. Replication and replicated logs
    6. Message producers
    7. Message consumers
    8. Role of Zookeeper
    9. Summary
  4. Deep Dive into Kafka Producers
    1. Kafka producer internals
    2. Kafka Producer APIs
      1. Producer object and ProducerRecord object
      2. Custom partition
      3. Additional producer configuration
    3. Java Kafka producer example
    4. Common messaging publishing patterns
    5. Best practices
    6. Summary
  5. Deep Dive into Kafka Consumers
    1. Kafka consumer internals
      1. Understanding the responsibilities of Kafka consumers
    2. Kafka consumer APIs
      1. Consumer configuration
      2. Subscription and polling
      3. Committing and polling
      4. Additional configuration
    3. Java Kafka consumer
    4. Scala Kafka consumer
      1. Rebalance listeners
    5. Common message consuming patterns
    6. Best practices
    7. Summary
  6. Building Spark Streaming Applications with Kafka
    1. Introduction to Spark 
      1. Spark architecture
        1. Pillars of Spark
        2. The Spark ecosystem
    2. Spark Streaming 
      1. Receiver-based integration
        1. Disadvantages of receiver-based approach
        2. Java example for receiver-based integration
        3. Scala example for receiver-based integration
      2. Direct approach
        1. Java example for direct approach
        2. Scala example for direct approach
    3. Use case log processing - fraud IP detection
      1. Maven
    4. Producer 
      1. Property reader
        1. Producer code 
        2. Fraud IP lookup
        3. Expose hive table
        4. Streaming code
    5. Summary
  7. Building Storm Applications with Kafka
    1. Introduction to Apache Storm
      1. Storm cluster architecture
      2. The concept of a Storm application
    2. Introduction to Apache Heron
      1. Heron architecture 
        1. Heron topology architecture
    3. Integrating Apache Kafka with Apache Storm - Java
      1. Example
    4. Integrating Apache Kafka with Apache Storm - Scala
    5. Use case – log processing in Storm, Kafka, Hive
      1. Producer
        1. Producer code 
          1. Fraud IP lookup
      2. Storm application
      3. Running the project
    6. Summary
  8. Using Kafka with Confluent Platform
    1. Introduction to Confluent Platform
    2. Deep driving into Confluent architecture
    3. Understanding Kafka Connect and Kafka Stream
      1. Kafka Streams
    4. Playing with Avro using Schema Registry
    5. Moving Kafka data to HDFS
      1. Camus 
        1. Running Camus
      2. Gobblin
        1. Gobblin architecture
      3. Kafka Connect
      4. Flume
    6. Summary
  9. Building ETL Pipelines Using Kafka
    1. Considerations for using Kafka in ETL pipelines
    2. Introducing Kafka Connect
    3. Deep dive into Kafka Connect
    4. Introductory examples of using Kafka Connect
    5. Kafka Connect common use cases
    6. Summary 
  10. Building Streaming Applications Using Kafka Streams
    1. Introduction to Kafka Streams
      1. Using Kafka in Stream processing
      2. Kafka Stream - lightweight Stream processing library 
    2. Kafka Stream architecture 
    3. Integrated framework advantages
    4. Understanding tables and Streams together
      1. Maven dependency
      2. Kafka Stream word count
      3. KTable
    5. Use case example of Kafka Streams
      1. Maven dependency of Kafka Streams
      2. Property reader
      3. IP record producer
      4. IP lookup service
      5. Fraud detection application
    6. Summary
  11. Kafka Cluster Deployment
    1. Kafka cluster internals
      1. Role of Zookeeper
      2. Replication
      3. Metadata request processing
      4. Producer request processing
      5. Consumer request processing
    2. Capacity planning
      1. Capacity planning goals
      2. Replication factor
      3. Memory
      4. Hard drives
      5. Network
      6. CPU
    3. Single cluster deployment
    4. Multicluster deployment
    5. Decommissioning brokers
    6. Data migration
    7. Summary
  12. Using Kafka in Big Data Applications
    1. Managing high volumes in Kafka
      1. Appropriate hardware choices 
      2. Producer read and consumer write choices
    2. Kafka message delivery semantics
      1. At least once delivery 
      2. At most once delivery 
      3. Exactly once delivery 
    3. Big data and Kafka common usage patterns
    4. Kafka and data governance
    5. Alerting and monitoring
    6. Useful Kafka matrices
      1. Producer matrices 
      2. Broker matrices
      3. Consumer metrics
    7. Summary
  13. Securing Kafka
    1. An overview of securing Kafka
    2. Wire encryption using SSL
      1. Steps to enable SSL in Kafka
        1. Configuring SSL for Kafka Broker
        2. Configuring SSL for Kafka clients
    3. Kerberos SASL for authentication
      1. Steps to enable SASL/GSSAPI - in Kafka
        1. Configuring SASL for Kafka broker
        2. Configuring SASL for Kafka client - producer and consumer
    4. Understanding ACL and authorization
      1. Common ACL operations
        1. List ACLs
    5. Understanding Zookeeper authentication
    6. Apache Ranger for authorization
      1. Adding Kafka Service to Ranger
      2. Adding policies 
    7. Best practices
    8. Summary
  14. Streaming Application Design Considerations
    1. Latency and throughput
    2. Data and state persistence
    3. Data sources
    4. External data lookups
    5. Data formats
    6. Data serialization
    7. Level of parallelism
    8. Out-of-order events
    9. Message processing semantics
    10. Summary