Summary

In this chapter, we learned the basics of exploring Spark data, using some Spark-specific commands that allowed us to filter, group, and summarize our Spark data.

We also learned about the ability to visualize data directly in Spark, along with learning how to run R functions such as ggplot against data.

We learned about some strategies for working with Spark data, such as performing intelligent filtering and sampling.

Finally, we demonstrated that often we need to extract some Spark data back into local R if we want the flexibility to use some of our usual tools that may not be supplied natively in the Spark environment.

In the next chapter, we will delve into the various predictive models that you can use that are specific to large ...

Get Practical Predictive Analytics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.