In this section, we will design a pipeline that will allow us to stream as well as persist the data. The persistence will allow us to look up the data on demand. Streaming will allow the learning algorithm to keep learning as soon as new data arrives.
We use the following technologies to achieve our goals:
We have already covered the setup of MongoDB and Apache Kafka in this chapter, and Apache Spark in the previous chapter. We only need to add Apache Spark streaming libraries to our build file