June 2017
Beginner to intermediate
576 pages
15h 22m
English
We will end up building this Spark dataframe via simulation. This will take up a good chunk of this chapter. I feel this is a better way to go rather than importing an existing public dataset in which you cannot control the makeup of the data. With a simulated dataset, you are free to size it however you like (subject to account restrictions).
However, you are always free to import whatever dataset you would like and the analytic concepts that follow will be the same.