June 2017
Beginner to intermediate
576 pages
15h 22m
English
In this chapter, we have learned what Spark is and learned about some of its advantages. We began to write programs which load and save data into Spark Clusters. We learned a couple of ways how we can construct our own very large Spark dataframes based upon characteristics of small datasets.
We also learned how to write Spark programs in Databricks and how to run standard R analysis and install packages. We also reinforced our knowledge of missing value imputations by substituting some missing values in the original data.
In the next chapter, we will take what we have built and start to explore the data.