Using StreamingLogisticRegression to classify a Twitter stream using Kafka as a training stream
In the previous recipe, we published all the tweets that were stored in ElasticSearch to a Kafka topic. In this recipe, we'll subscribe to the Kafka stream and train a classification model out of it. We will later use this trained model to classify a live Twitter stream.
How to do it...
This is a really small recipe that is composed of 3 steps:
- Subscribing to a Kafka stream: There are two ways to subscribe to a Kafka stream and we'll be using the
DirectStreammethod, which is faster. Just like Twitter streaming, Spark has first-class support for subscribing to a Kafka stream. This is achieved by adding the
spark-streaming-kafkadependency. Let's add it ...