How to do it...

  1. Start a new project in IntelliJ or in an IDE of your choice. Make sure the necessary JAR files are included.
  1. We define the package information for the Scala program:
  1. Import the necessary packages:
import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession import
  1. We now define two Scala case classes, to model movie and ratings data:
case class Movie(movieId: Int, title: String, year: Int, genre: Seq[String]) case class FullRating(userId: Int, movieId: Int, rating: Float, timestamp: Long)
  1. In this step, we define functions for parsing a single line of data from the ratings.dat file into the ratings case class, and for parsing ...

Get Apache Spark 2: Data Processing and Real-Time Analytics now with the O’Reilly learning platform.

O’Reilly members experience live online training, plus books, videos, and digital content from nearly 200 publishers.