Chapter 7. Spark 2.0 Concepts

Now that you have seen the fundamental underpinnings of Spark, let's take a broader look at the architecture, context, and ecosystem in which Spark operates. This is a catch-all chapter that captures a divergent set of essential topics that will help you get a broader understanding of Spark as a whole. Once you go through this, you will understand who is using Spark and how and where it is being used. This chapter will cover the following topics:

  • The Datasets accompanying this book and the IDEs for data wrangling
  • A quick description of a data scientist's expectation from Spark
  • The Data Lake architecture and the position of Spark
  • The evolution and progression of Spark Architecture to 2.0
  • The Parquet data storage mechanism ...

Get Fast Data Processing with Spark 2 - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.