Kafka

Kafka is a publish-subscribe messaging system that provides a reliable Spark Streaming source. With the latest Kafka direct API, it provides one-to-one mapping between Kafka's partition and the DStream generated RDDs partition along with access to metadata and offset. Since, Kafka is an advanced streaming source as far as Spark Streaming is concerned, one needs to add its dependency in the build tool of the streaming application. The following is the artifact that should be added in the build tool of one's choice before starting with Kafka integration:

 groupId = org.apache.spark 
artifactId = spark-streaming-kafka-0-10_2.11 
version = 2.1.1 

After adding the dependency, one also needs basic information about the Kafka setup, such as ...

Get Apache Spark 2.x for Java Developers now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.