O'Reilly logo

Building a Recommendation Engine with Scala by Saleem Ansari

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Data processing pipeline for Entree

In this section, we will design a pipeline that will allow us to stream as well as persist the data. The persistence will allow us to look up the data on demand. Streaming will allow the learning algorithm to keep learning as soon as new data arrives.

We use the following technologies to achieve our goals:

  • Akka: For message based concurrence to delegate as much as we can
  • MongoDB: For data persistence and querying
  • Apache Kafka: For high performance persistence queuing
  • Apache Spark: For high throughput stream processing

We have already covered the setup of MongoDB and Apache Kafka in this chapter, and Apache Spark in the previous chapter. We only need to add Apache Spark streaming libraries to our build file build.sbt ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required