O'Reilly logo

Practical Predictive Analytics by Ralph Winters

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Simulation

We will end up building this Spark dataframe via simulation. This will take up a good chunk of this chapter. I feel this is a better way to go rather than importing an existing public dataset in which you cannot control the makeup of the data. With a simulated dataset, you are free to size it however you like (subject to account restrictions).

However, you are always free to import whatever dataset you would like and the analytic concepts that follow will be the same.

  1. Preliminaries first, you will need to register and log on to your databricks account.
  2. Next, create a cluster. Give it a name, such as MyCluster.
  3. To conform with the examples in this chapter, make sure you choose Spark 2.1. This is very important. Since Spark is an ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required