O'Reilly logo

Mastering Apache Spark 2.x - Second Edition by Romeo Kienzler

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Kafka

Apache Kafka (http://kafka.apache.org/) is a top-level open source project in Apache. It is a big data publish/subscribe messaging system that is fast and highly scalable. It uses message brokers for data management and ZooKeeper for configuration so that data can be organized into consumer groups and topics.

Data in Kafka is split into partitions. In this example, we will demonstrate a receiverless Spark-based Kafka consumer so that we don't need to worry about configuring Spark data partitions when compared to our Kafka data. In order to demonstrate Kafka-based message production and consumption, we will use the Perl RSS script from the last section as a data source. The data passing into Kafka and to Spark will be Reuters RSS news ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required